TITLE: System for Electronically Managing, Finding, and/or Displaying Biomolecular 
Interactions 

A poriion of ihc disclosure of this paicnit document contains material which is subject lo 
copyright protection. The copyright owner has no objection to the facsimile reproduction by any one of 
the patent disclosure, as it appears in the Patent and Trademark files or records, but otherwise reserves 
all copyright rights whatsoever. 
FIRI.D OF THE INVENTION 

The invention relates to a system, methods and products for managing, finding, and/or 
displaying biomolecular interactions. 
RACKGROTTNn OF THE INVENTION 

Technological advances and mounting interest have pushed proteomics into the scientific 
spotlight. This growing field encompasses the study of proteins, both in structure and in function, 
contained in a proteome - die proiein equivalent of a genome. Because of increased interest and 
technique automation (Mendelsohn et al, 1999), the rate of proteomic data production is growing in a 
similar fashion as that of genomics a decade ago. For example, mass spectrometers, gene chips, and 
two-hybrid systems have made cellular signaling pathway mapping faster and easier and consequently 
these are becoming large producers of data. Protein-protein interaction and more general biomolecule- 
biomoiecule (protein-DNA, protcin-RNA, protein-small molecule, etc.) interaction information is being 
generated and recorded in the literature. Lessons from the genomic era have taught us that large 
amounts of related data recorded in scientific journals soon becomes unmanageable. A well designed 
common data specification based on a model of the biological information is therefore required to 
describe and store biomolecular interaction data. 
SUMMARY OF THE INVENTION 

The present inventors have designed a data specification for the storage and management of 
biomolecular interaction and biochemical pathway data that possesses the following properties: 

1. It describes the fiill complexity of the biological data, from simple binary interactions to 
large-scale molecular complexes and networks of pathways and interactions. It stores protein, DNA, 
RNA, and other molecules in full atomic detail, since character based sequence absu-aciions of 
biomoleculcs often miss important chemical features, such as methylation on DNA. This allows as 
much data as possible to be stored for scientific use in electronic form rather than in print. 

2. It is easily computable. A computer can easily read, write, and traverse the specification. 
This facilitates maintenance of a database of such information, creation of advanced queries and 
querying tools and development of computer programs that use the information for data visualization, 
data mining, and visual data entry. 

3. It is platform and database independent. Tools written for one platform can read data 
created on another platform directly. It handles the data structure without modification as well. 

4. It is succinct and easy for humans to understand. Field to data correspondence is very clear 
and a human readable format of the specification is available. 



The data structure was designed for a database referred to herein as "BIND" (Bion^olecular 
Interaction Network Database). The data structure is written in a data specification language called 
Abstract Syntax Notation. 1 (ASN.l. also known as X.20S or ISO-8824) 
(http://www.oss.coni/asnl/index.hlml). The U.S. National Center for Biotechnology Information 
(NCBI) uses ASN.l to describe and store all of its biological and publication data and all of GenBank. 
MMDB and PubMcd {Ostell and Kans, 1998). BIND inherits the NCBI data model, which provides a 
solid foundation for the BIND data specification through the use of mature NCBI data types that 
describe sequence, 3D structure, and publication reference informanon. 

Although the specification is written in ASN.l, it is not resu-icted to this syntax. The data 
su-ucturcs can be readily translated to other common data specification languages such as CORBA IDL 
(Object Management Group, 1996) or XML (http://www.w3.org/XML) if the need arises. Aside from 
ASN.l, no other biological data specification is sufficiently rich in mature data types to use as a 
foundation for BIND without first building and testing those base data types. 

The BIND data specification represents complex cellular pathway information efficiently in a 
computer. BIND defines three main data types: interactions, molecular complexes, and pathways- 
Each of these objects is composed of various component and descriptor objects that are either defined 
in the specification proper or inherited from the NCBI ASN.l data specifications. For example, an 
interaction record contains, among other data objects, two BIND-objects. A BIND-object describes a 
molecule of any type and is itself defined using simpler sub-objects. Normally, a BIND-object 
describing a biopolymer sequence will store a simple link to a sequence database, such as GcnBank 
(Benson et a!., 1999). If, however* the sequence is not present in a public database, it can be fully 
represented using an embedded NCBI-Bioseq object. The NCBI-Bioseq object is how NCBI stores all 
of the sequences in GenBank and is a mature data suucttire. BIND also inherits the NCBI taxonomy 
model (also used and supported by EMBL, DDBJ and Swiss-Prot) and data, via an inherited NCBI- 
BioSourcc, and is designed so that interactions can be both inter- and intra-organismal. Sequence, 
structure, publication, taxonomy and small molecule databases provide a strong foundation for BIND. 

Broadly stated, the present invention contemplates a system for electronically managing, 
finding, and/or visualizing biomolecular interactions comprising a computer system including at least 
one computer receiving data on biomolecular interactions from a plurality of providers and processing 
such data to create and maintain images and/or text defining biomolecular interactions, said computer 
system, m response to c:. ^ requests, creatmg and transmitting to a plurality of end-users, the images 
and/or text defining biomolecular interactions. 

In an embodiment, a system for electronically managing, finding, and/or isualizing 
biomolecL ._r interactions is nrovided comprisir,,:. 

(a - maintenance entity for receiving data on biomolecular mtcractions from a plurality of 
providers and means for receiving and processing such data to create and maintain 
images and/or text defining biomolecular interactions; and 



(b) one or more computer systems mamtaincd by ihc mainienance entity and havtng m^ans 
for creating and transmuting lo a plurality of end-users the images andyor text defining 
biomolecuiar intcractions- 

The system is useful in managing, finding, and/or displaying biomolecuiar mteractions 
including mteractions involving proteins, nucleic acids (RNA, DNA), and ligands, molecular 
complexes, and signaling pathways. The interactions are defined both at the molecular and atomic 
levels and in particular they may be defined by chemical graphs. 

The invention also provides a method for displaymg on a computer screen information 
concerning biomolecuiar interactions comprising retrieving an image and/or text defining a 
biomolecuiar interaction from a system of the invention. 

The present invention also provides a data structure stored in the memory of a computer the 
data structure having a plurality of records and each record containing a biomolecuiar interaction and 
information relating to the biomolecuiar interaction. In an embodiment the biomoleciilar interaction is 
identified by chemical graphs. The information in the data structure may be accessible by using indices 
which may represent selections of information from the chemical graphs. 

The term "record" used herein generally refers to a row in a database table. Each record 
contains one or more fields or attributes. A given record may be uniquely specified by one or a 
combination of fields or attributes known as the record's primary key. A record of a biomolecuiar 
interaction as used herein is generally a record containing information identifying the biomolecuiar 
interaction as a chemical graph and a plurality of other attributes with information pertaining to the 
biomolecuiar interaction (e.g. information on the cellular place of interaction, experimental conditions 
used to observe the interaction, conserved sequence comment of molecules in the interaction if they are 
biological sequences, information on molecules in the interaction* description of metabolic and 
signaling pathways, cell cycle stages in which an interaction is involved, locations of binding sites on 
the molecules in an interaction, chemical actions mediated by the interactions* and chemical states of 
the molecules in the interaction). 

The term "chemical graph" refers to a connectivity graph of all the atoms and bonds in a 
molecule in a biomolecuiar interaction. The graph may include three-dimensional coordinates. 

The invention also provides a method for storing a representation of a biomolecuiar 
interaction in a memory of a computer system, the method executed on a computer system and 
comprising the steps of: 

(a) identifying a chemical graph of a biomolecuiar interaction; and 

(b) storing a record in a data structure of the invention. 

The invention further provides a method for storing a representation of a biomolecuiar interaction 
in a memory of a computer system, the method executed on a computer system and comprising the 
steps of: 

(a) identifying a chemical graph of a biomolecuiar interaction; 

(b) generating one or more indices from information in the chemical graph; and 



(c) storine a record in a data structure of the invention. _ 
The invention still further provides a method for identifying a biomoiecular interaction that is 

similar to a reference biomoiecular interaction, the method executed on a computer and comprising the 

steps of: 

5 (a) conducting a similarity search for each molecule in a test biomoiecular interaction; 

(b) screening the results of the similarity search preferably by selected taxonomy; 

(c) assembling a putative biomoiecular interaction to create a test record; 

(d) accessmg one or more records in a data structure stored in the memory, the data siruciure 
having a plurality of records* each of the records containing a reference biomoiecular 

jQ interaction and information relating to the reference biomoiecular interaction; and 

(e) matching the test record with each record in the data structure to produce a matching record 
contaming a reference biomoiecular interaction matching the test biomoiecular interaction. 

The similarity searches may be based for example on sequence similarity or identity, or 
similarities in molecular weightsrpls, mass fingerprinting data or mass spectrometric data, fragmcnt- 
15 ion tag data, peptide masses from enzymatic digestion, fragment ion masses, isotope patterns, and 

sequence tag data. Standard tools available in the art for similarity searching and screening can be used. 
(For example, the following tools may be used BLAST http://www.ncbi.nlm.nih.gov/BLAST/ , 
BioScan, Fasta3, PropSearch, SAMBA, SAWTED. Scanps, PDF, ExPASY Proteomics Tools - 
http:www.expasy.ch/tools, Tagldent: http://www.cxpasv.ch/tools/tagident.htmL Peptldent: 
20 http://www.expasv.ch/tools/peptident.htmL ProteinProspector: http://prospector.ucsf.edu/, Multildent: 

PeptideScarch: http://www.mann.embl-heidelbcrg.de/Scrvices/PeptideScarch/PeptideSearchIntro.htinL 
PRnWT^: http://prowLrockefeller.edu/: Mascot:httD://www.matrixscience.coni/cgi/index.pi?page==/searc 
h_fonn_selcct.html; BioSCAN, Pro). 

Another aspect of the invention provides a computer system for storing a representation of 
25 one or more biomoiecular interactions in a memory in the computer system and for comparing one or 

more reference biomoiecular interactions to a test biomolccular^interaction, comprising: 

(a) a database means stored in the memory representing one or more biomoiecular interactions; 
each of the biomoiecular interactions represented by a chemical graph; and 

(b) a data structure means for storing a plurality of record means, each record means containing 
30 chemical graphs of the test biomoiecular interaction. 

The invention ai.o provides a c ^mputer system comprising memory means, storage means, 
program means, and stored means for ouilding virtual-models of biomoiecular interactions in the 
computer system comprising: 

(a) one or more libraries of rcrercnce biomoiecular interactions that comprise any number of 
35 attributes or components of the biomoiecular interaction which values are either being 

used to describe charactenstscs of the types of biomiolecular interactions m the computer 
system, or values or data structuies used by the program at runtime, or are to be used to 
more specifically describe characteristics of individual components o: - biomoiecular 



interaction that each instance of a type of biomolccular interaction is to represent, or 
characteristics of each instance of biomolecuiar interaction m the computer system; 
wherein the attributes have values of any type in the computer system or m a network 
accessible by the computer system; 

(b) means for manipulating the biomolecuiar interaction by domain expens or program 
means comprising visual means for making the biomolccular interactions available 
through menus or palettes or programmatic means; and 

(c) constructor means to create new instances from the definitions of the biomolecuiar 
interactions, and means to establish directional output-input links between 
complemcnatary instances of t^ic biomolecuiar interactions directly or through 
components. 

Also provided is a computer system comprising: 

(a) a database having a j)luraiity of records, wherein each record contains a reference 
biomolecuiar interaction defined by a chemical graph and descriptive information from an 
external database which information correlates the biomolecuiar interactions to records in 
the external database; and 

(b) a user interface allowing a user to selectively view information regarding a biomolecuiar 
interaction. 

In an embodiment, a computer system is provided comprising: 

(a) a database having a plurality of records, each of said records containing a 
reference biomolecuiar interaction defined by a chemical graph and descriptive information 
from an external database, which information correlates the biomolecuiar interactions to 
records in the external database; 

(b) a processor in communication with said database and responsive to user input to 
access records in said database; and 

(c) a user interface allowing a user to provide user input to said processor to 
selectively view information regarding a biomolecuiar interaction. 

Still further the invention provides a database system comprising a plurality of internal records, 
the database comprising a plurality of records, wherein each record contains a biomolccular interaction 
defined by chemical graphs and descriptive information from an external database which information 
correlates the biomolecuiar interactions to records in the external database. 

In an embodiment the external database is PubMed, The interface of the computer system may 
further comprise user selectable links to enable a user to access additional information for a 
biomolecuiar interaction. The links may comprise HTML links. 

Additionally provided is a method of using a computer system to present information, or a method 
of presenting information pertaining to records of biomolccular imcraciions in a database, the records 
containing information identifying the biomolecuiar interaction and defining the biomolecuiar 
interaction by chemical jgraphs, the method comprising: 



(a) providing an interface for cmcrmg query information relating to a biomolecular 
interaction; 

(b) locating data coaesponding to the entered query information; and 

(c) displaying the data corresponding to the entered query information. 
In step (b) the data is located by examining records in the database. 

The invention funher provides a computer program product compnsmg a computer-usable 
medium having computer-readable program code embodied thereon relating to a plurality of records of 
biomoiccular interactions, the records identifying the biomolecular mteractions and definmg chemical 
graphs of the biomolecular interactions, the computer program product comprising computer-readable 
program code for effecting the following steps within a computing system: 

(a) providing an interface for entering query information relating to a biomoiccular interaction; 

(b) locating data corresponding to the entered query information; and 

(c) displaying the data corresponding to the entered query information. 

The invention contemplates a^atabase storing data relating to biomolecular interactions 
comprising: 

(a) first data types describing biomolecular interactions between chemical objects; 

(b) second data types describing collections of biomolecular interactions; and 

(c) third data types describing pathways between said collections of interactions. 

The first data types may include objects for the chemical objects, each of the objects including 
at least one of a pointer to an external database describing the chemical object, a sequence, and a 
chemical graph. The first data types may be stored as records and further include objects identifying the 
biomolecular interactions and defining chemical graphs of the biomoiccular interactions. 

The second data types may include lists of identifications referencing the biomolecular 
interactions in the collections. The third data types may include objects for the chemical objects that 
can form networks of interactions. The networks of interactions may include metabolic pathways and 
cell signaling pathways. The third data types may additionally include sequences of identifications 
referencing biomolecular interactions that make up the pathways. 

The systems and products of the present invention may be used to study and identify biomolecular 
interactions. Such information is of significant interest in pharmaceutical research, particularly to 
identify potential drugs and targets for drug development. The systems and products provide great 
power and flexibility in analyzing biomolecular interactions. 

Further features and advantages of the present invention, as well as the structure and operation of 
various embodiments of the present invention, are described in detail below with reference to the 
accompany drawings. 
DESCRIPTION OF THi. DRAWINGS 

The invention will be better understood with reference to the drawmgs in which; 

Figure I • Storing a chemical object - A BI>ID-objcct data type. A chemical object can be any 
molecule or atom. Associated data types arc also shown. Legend: Each box is a data type. Dashed 



outline boxes represent ASN.l fields n:iarkcd as OPTIONAL. Single headed arrows point to expanded 
definition for a data type. Double headed arrows represent one to many relationships (repeated fields 
or objects). 

Figure 2 - Storing a chemical object (continued). 

Figure 3 - Storing a biomolecular interaction - A BIND-Interaction data structure. 

Figure 4 - Stonng the cellular place information - A BIND-place object and associated data 
types. General place is saved using enumerated fields for computability and specific place is more 
detailed and human-readable. 

Figure 5 - Storing experimental condition information - A BlND-condition data object and 
associated data types. 

Figure 6 - Storing conserved sequence information - a BIND-conserved-seq object and 
associated data types. Conserved sequence may be stored for molecule 'a' or h\ 

Figure 7 - Storing binding site location - A BIND-loc object and associated data types. Any 
number of binding sites may be stored for either molecule *a' or b' in an interaction. 

Figure 8 - Storing chemical actions - A BIND-action object and associated components. Any 
number of chemical actions may be stored in an interaction. 

Figure 9 - Storing chemical state - A BIND-state data type and associated objects. Any 
number of chemical states may be stored in an interaction. 

Figure 10 - Representing molecular complexes - A BIND-Molecular-Complex object and 
related data types. 

Figure 11 - Storing biochemical pathways - A BIND-Pathway object and associated data 
types. Cell cycle information can be stored. 

Hgure 12 is a schematic diagram showing a software development method; 

Figure 13 is a schematic diagram showing a major subsystem overview of BIND; and 

Hgure 14 is .a schematic diagram showing the data entry process for BIND. 
DESCRIPTION OF PREFERRED EMBODIMENTS OF THp: INVENTION 

The present inventors have developed BIND or the Biomolecular Interaction Networks 
Database and its related tools for both the management and mining of molecular interaction data. BIND 
permits the rapid identification and visualization of new and known cellular pathways using 
bioinformatics methods, and it provides an understanding of these interaction pathways. BIND is 
stored in memory within a host computer system including one more computers that is responsible for 
maintaining BIND. Biomolecular interactions arc received from internal and external providers 
through conncctioi\s to the host computer network and arc processed to maintain images and/or text 
defining the biomolecular interactions. Images and/or text defming biomolecular interactions in BIND 
can be conveyed to end- users through computer connections to the host computer system allowing end- 
users to display the biomolecular interaction data on the monitor display screens of their computers. 
(See Figure 12 for a schematic diagram of the BIND software development method; Figure 13 for a 
schematic diagram showing the major components of the BIND system; and Figure 14 for a schematic 
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diagram of a data entry process for BIND). In ihis way, biomolecular interacnons can be ciectromcaliy 
manaecd, located and/or visualized. Further details concerning BIND will now be described. 

Biomolccular information is stored in records within BIND. A BIND record can describe any 
molecular interaction, stored in a BIND-object as a) a pomtcr to another database, b) a sequence or c) a 
5 chemical graph. Two BIND-objects that interact are held in an interaction record withm BIND. The 

interaction record can represent the binding imeraction at vanous levels of detail. BIND also stores 
kinetic mformation, bibliographic information, interaction locations, conserved sequences, mediating 
interactions, chemical reactions that take place and activation states of BIND-objecis. BIND draws 
upon NCBI data format standards, and thus BIND is compatible with other public sequence and 

10 structure databases. BIND forms a data-space that can contain large molecular interactions, such as a 

protein signaling complex, to dctzilcd descriptions of atomic level interactions. The design and 
database samples and tools to aid in web-based data entry and retrieval are described herein. A 
graphical system of data reu-ieval and data mining agents that scour this data space' for novel links 
between known interaction pathways can be implemented based on this design. 

15 The BIND Data Model 

The three main types of data objects in the BIND specification - interaction, molecular 
complex and pathway - as well as useful database management and data exchange objects are described 
below. Each of the main objects is composed of various descriptor objects that are either defmed in the 
specification or taken from the NCBI ASN.l data specification. For example, an interaction record 

20 contains, among other data object, two BIND-objects. These BIND-objects are themselves defined 

using simpler sub objects. Normally, a BIND-object that is describing a protein sequence will store a 
simple link to a sequence database, such as GcnJ3ank. If, however, the sequence is not present in the 
public database, it can be fully represented tising an NCBI-Bioscq object. The NCBI-Bioseq object is 
how NCBI stores all of the sequences in GenBank and is a mature data structure. 

25 BIND also inherits the NCBI/EMBI7DDBJ taxonomy model and data, and is designed so that 

interactions can be both inter and intra organismal. Below is an example of the types of biological data 
for each BIND record. Explanations of the various objects in the specification arc given along with 
examples. The BIND specification is explained as if it were being used to describe a single record in a 
database. 

30 The BIND database is generally meant to reference information from other databases rather 

than storing the information as a copy. This avoids unnecessary duplication of information among 
databases and helps maintain data integrity (if the information in a referenced record in one database is 
updated, the other databases that reference the record are all automatically updated). All fields arc non- 
optional unless stated otherwise* 

35 A BIND-object 

A BIND-object represents any chemical object - atom, molecule or complex of molecules. 
Sec Figures 1 and 2 for a diagrammatic description of the data type. A BIND-object conuins: 



1 . A short 'label field to contain a short name for a molecule. For example, ATP, 1P3, S4 and 
HSP70 arc acceptable short labels for ligands and proreins* respectively. Having a non-optional short 
label ensures thai at least some dcscnpiivc data is entered for a molecule. This mformation is also 
useful to construct top-level descriptions regarding a panicular record. For example, a simple 
description of an interaction between two proteins can be constructed using the short labels of the two 
BIND-objects in an interaction record, A graphical view of an interaction would be labeled with the 
short label field. 

2. A BIND'Object-rype-id object to contain the type of the molecule and a reference to another 
database containing a record for that molecule. In this way, for instance, large DNA records arc 
referenced rather than duplicated. A molecule type may be hot-specified', 'protein\ 'dna\ 'rna', ligand\ 
or 'molecular complex'. Molecules of unknown type may be stored by specifying the type of molecule 
as bot-spccified\ This type requires no further data input. 

Protein, DNA and RNA all require a BIND-id object This object can store accession 
numbers to any other database. It has special fields 'gi' or Gcninfo and 'di' or domain identifier for the 
NCBI Entrez system (Schuier et ah, 1996) and a database of domains, respectively. Any other 
accession number or numbers/strings to reference records in other databases can be stored in a set of 
NCBI Seq-id's present in the data object. All fields in BIND-id are optional so molecules stored 
internally in a BIND record that are not present in other databases (and so do not have accession 
numbers) can be properly saved. 

Molecules of type ligand' require a BIND-ligand-id object. This object can contain a 
reference to an internal small molecule database or any other small molecule database via a database 
name and an integer and/or character based accession number. 

BEND-objects of type 'complex' require an integer accession number to a BIND molecular 
complex record. 

3. A BIND-object-origin data structure. This data structure contains a choice of origin 
between 'not-specified', 'org' or organismal, and 'chem^or chemical. BIND-objects of unknown origin 
would have origin type 'not-specified*. Chemical objects that are derived directly from organisms, such 
as DNA, would be specified to be origin type brg' and are required to be associated with an NCBI 
BioSource object. A BioSource object can contain much descriptive data about an organism and the 
biological source of a compound. It also conuins a reference to a taxonomy database. This 
information can be entered automatically if a GI is known for a biological sequence molecule, since a 
BioSource is part of the NCBI Bioscq object which stores biological sequences in Entrez. If a GI is not 
given, a BioSource can be created. 

Molecules derived purely from chemical means arc of origin type them' and require a BIND- 
chcmsource object. The BIND-chemsource object contains a set of names for the chemical, usually a 
common name and any synonyms, a SMILES string (Weininger, 1988), the chemical formula, 
molecular weight (a RealVal-Units object), and a CAS regisu-y number (http://www.cas.org). A 
SMILES suing is a standard way of representing a molecule's sUTiciure using ASCII characters. Many 
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chemistry computer applications are available to manipulate and use data of this t>'pe. Three- 
dimensional structure of a molecule can be predicted from a SMILES strmg to a high degree of 
accuracy using commercial chemistry applications such as Corina (Gasteiger, 1996) and others. A 
CAS number is a reference number to the information regarding a chemical compound in the Chemical 
Absuacts Service. This service contains data on at least 22,468*564 chemical compounds. Of all the 
fields in a BIND-chemsource object, only 'names* is non-optionai. This means that for a BIND-objeci 
to be declared a ligand of chemical origin, one must only provide a pointer to a small molecule 
database and one name of the chemical. 

4. An optional BIND-cellstage list to contain a list of cell cycle stages in which this object is 
found, or expressed, in the given organism. This information is only relevant for BIND-objects of 
organismal origin. A BEND-cellstagc object is an enumeration of all of the basic cell stages in the cell 
cycle. It contains an optional text description field that can describe other cell stages that are not 
present in the enumeration. 

5. An optional NCBl Bio^q object to store a biological sequence if a record for the sequence 
is not present in any public database. The Bioseq may also be used to store the experimental form, 
such as His tagged proteins or mutants, of the biological sequence if it is different from any public 
database record. This field is only relevant for biological sequences. Bioseqs can be prepared using 
Sequin (Kans et aL, 1998) and can be exchanged with NCBL 

6. An optional NCBI Biostruc object to store a three dimensional atomic structure of any 
chemical object, from an atom to a complex of molecules, if the data is not present in any public 
database. The Biostruc specification allows a chemical graph to be stored without coordinates. This is 
most useful for storing small molecule structures or post-translationally modified forms of a 
biomolecule. Thus, chemical entities within a BIND object can be described in precise detail. 

The presence of these powerful and mature data structures in this part of the specification 
demonstrates that BIND is not completely reliant on other databases. Most of the information present 
in any public sequence or 3D molecular structure database can be stored using the BIND specification 
if necessary. 

7. An optional free flow text description of the BIND-obJect, This field contains, for example, 
a fiiU name for a molecule such as Adenosine Triphosphate (ATP). 

BIND-Interaction 

The BIND-Intcraction object is the fundamental component for sionng data m this 
specification. It dcfmes and describes the interaction between any two molecules, or even atoms. The 
majority of the information that can be stored is, however, used to describe interactions between 
proteins, DNA and RNA. Interactions between molecules rather than between molecules and atoms are 
exemplified from this point on. Sec Figure 3 for a diagrammatic representation of the data type. 

A BIND- interaction conuins a NCBI Dale object, a sequence of updates for an audit uail, an 
Interaction Identifier (IID) accession number, two interacting molecules (BIND-object), a description 
of the interaction, a series of publications and a private flag. BIND lED number space is to be 



controlled using a unique key server. Molecule A binds to molecule B and both are stored u^mg 
BIND-objecls (described above). 

The BIND-dcscr object stores most of the information in an mieracuon object. It coniams text 
description of the interaction, information on cellular place of interaction, experimental conditions used 
10 observe the interaction, conserved sequence comment of molecules A and/or B if they arc biological 
sequences, location of binding sites on molecule A and B, chemical actions mediated by the interaction 
and chemical states of the molecules A and B. 

A BIND-pub-sct is included to store empirical evidence references, usually publications, that 
^support; 'dispute' or have 'no opinion' regarding the actual interaction. The dispute flag allows the 
database to track experimental trends and offer a machine-readable way to find discrepancies or 
differences of opinion. 

Finally, the private flag which defaults to FALSE is included in the BIND-interaclion record. 
The flag indicates whether or not to export this record during a data exchange proccdiire. In a public 
database, a private record is not available to the public. This may be because the record has not been 
completed or information in the record has not t>cen verified- In a private database, the private flag 
means that the record can be viewed internally, but not exported. In this situation, a private record 
might contain proprietary information. BIND may contain a mix of these and public records imponed 
from a public database. 
Interaction description - BIND-descr 

All of the objects directly linked in this structure are optional to allow any level of richness of 
data to be stored. BIND-dcscr contains: 

1 . A simple text description of the interaction. This free flow text is meant to be a short 
description of the interaction such as, "transcription factor X binds to a region of human DNA in 
section x of chromosome 11". 

2, A sequence ofBlND-place objects, S(5e Figure 4 for a diagram of this data type. A BIND- 
place object stores information about the location of the interaction with nespcct to the cell. The place 
of an interaction is meant to be the location where molecule A and B come together in a biologically 
meaningful way. This object contains a BIND-gcn-placc-sct object for storing general place data, an 
optional BIND-spcc-place-set object for storing specific place data, an optional BIND-pub-sct for 
storing publications referring to the localization of an interaction, and an optional text description field. 
A BIND-gen-placc-set contains a start and an optional end place for the interactions, specified by an 
enumerated list of general places in the cell. Storing a start and an end place for an interaction takes 
into account the possibility of an interaction translocating across membranes and ending up in different 
sub-cellular compartments. The general enumeration of cell places allows a computer to understand 
the location of the interaction. Only basic cell places are present in the list. This is important for data 
visualization programs that need to be able to draw molecules in the correct places on a diagram of a 
ceil. A human readable description of cellular place can be stored in the BIND-spec-place-set. This 
object contains a text description of a start and an optional end place for an interaction. More specific 



data regarding ihc location of inlcracnon, such as in what part of a membrane, apical or basal, an 
interaction occurs can be stored in the BIND-spec-place-sct object. 

Multiple BIND-place objects are present to allow storage of an mteraciion that may be present 
only at certain separate places within and around the cell. More than one BIND-place object can also 
be used to describe an interaction occurring between two molecules over multiple sub-cellular 
companmcnis, as might be the case for transmembrane receptor protems with large extra and intra- 
cellular domains. 

3. A BlND'Condition set to store a list of experimental conditions used to observe the 
interaction. See Figure 5 for a diagrammatic view of the BIND-condition data type. Experimental 
conditions information stored should be sufficient to allow recrcauon of the original experiment. An 
experimental condition is described using a BIND-condition object. This object contains an Internal- 
conditions- id (ICID) number which can be used to reference a panicular experimcnul condition in the 
BIND-condition-set. A general experimental condition is an enumeration of three general conditions, 
in-vitro, in-vivo and other. A BIND-experi mental-system object is present and is an enumeration of 
most popular experimental techniques, with 34 techniques listed in the specification. This field has 
been simply declared as an INTEGER enumeration type so that it can be easily extended with new 
experimental systems as they become available. Declaring a type as INTEGER in ASN.I instead of 
enumeration prevents generated code from chex:king the name of the enumerated value against the 
specification. This means that items may be added to the list at a later date without disrupting tools 
that are based on previous specifications. A BIND-condition object also contains a free human 
readable text description. This field could be used to describe a system further or could be used to 
name a system if biher' has been specified as the BIND-experimental-systcm object. A BIND-pub-sei 
is also provided in order to store publications related to the experimental systems described in the 
BIND-condition object, 

4. A BIND-conS'Seq-set to store information about evolutionarily conserved sequence if either 
molecule A or B is a biological sequence. Se« Figure 6 for the diagram of this dau object. This 
information is simply meant to be a comment on the possible importance of certain sequence elements 
that have been noticed to be conserved via phylogcnetic or other evolutionary analysis. It is possible 
that information about conserved sequence is known for molecules in an interaction that is not very 
well characterized. This data might be useful to investigators interested in funher studying the 
interaction. A BIND-cons-seq-set conLams conserved sequence information about molecule A and B in 
a BIND-conscrvcd-seq object. Semantically, a BIND-conserved-seq object may only be insuntiated 
with data if the molecule that it refers to is a biological sequence, A BIND-conscrved-seq object 
contains an NCBI Seq-loc object. A Seo-ioc can contain a location or a set of locations tor any linearly 
numbered biological sequence. A free text description is also included in a BIND-conservcd-scq. It is 
suggested that the method of dctermmmg the conserved sequence, for example a phylogenctic tree 
program such as FHYLEP (http://evolution. genetics. washingion.edu/phy lip.html) or an alignment 
program such as PSI-BLAST (Altschul ct al., 1997) or CLUSTAL (Higgins et aL, 1996) be stored in 



the clcscr* field. A BlND-pub-set object is provided to store publications pertaining to a conserved 
sequence comment. 

5. A BlND'loc !o store binding site information. Figure 7 coniains a dtagrammaiic view of 
this data type. The BIND-loc can store 3D atomic level detail of an interaction sue using an NCBI 
Biostruc. A BIND-loc-gen object is present to store binding sites in an interaction at the sequence 
element level of detail. Therefore, only interactions involving biological sequences can hold general 
binding site information. The BIND-loc object also includes a BIND-pub-sei for stormg publications 
related to binding site. All top level fields are optional allowing detailed, general and/or source 
information to be represented. Expanding further, the BIND-loc-gen object contains a list of binding 
sites on molecule A and a list of binding sites on molecule B. This information is contained in a 
BIND-ioc-sitc-set object which contains a sequence of binding sites defined in BIND-loc-site objects. 
Each BIND-loc-site element contains an NCBI Seq-loc clement and an internal reference integer ID 
called a BIND-Seq-loc-id. Since each binding site is numbered in a BIND-loc-site-set, it can be 
referenced by other objects. ^ 

A BIND-loc-gen object also contains an optional BIND-loc-pair object which specifics which 
binding sites on A bind to which binding sites on B. The binding sites are referenced from the BIND- 
loc-site-set objects so in order to use a BIND-loc-pair object, binding sites on molecule A and B must 
already be defined. This simple binary mapping allows most experimental binding information, such 
as that generated from footprinting analysis, to be stored. 

6. A set of BlND-actions to describe the chemical action(s) mediated by this interaction. 
Figure 8 shows a diagram of this data type and related objects. A set of actions is required because 
there arc many examples of interactions having multiple chemical actions. For instance, a kinase may 
phosphorylatc a protein more than once in separate chemical actions or a restriction enzyme may 
cleave a molecule of DNA in more than one place. A BIND-action-set contains a set of elaborate 
BIND-action objects. Each BIND-action object in a set is numbered with an Intemal-action-id (lAID) 
integer so that it can be referenced by other data types. 

A BIND-action object coniains an LAID number, an optional text description field for free 
flow text description of the chemical action and an optional BIND-pub-sct for storing publications 
pertaining to this chemical action. A boolean flag is included to specify the direction of the chemical 
action. If a-on-b is set to true, then molecule A acts on molecule B, and vice versa. This value defaults 
to true. The type of action is defined in the BIND-action-type object. The BIND-action-iype object is 
a choice element that stores the type of chemical action and an associated dau object. The possible 
choices of actions are 'noi-specified' for an unknown chemical action type, 'add' for adding a chemical 
object, remove' for removing a chemical object, but-seq' for a cut in a biological sequence, (change- 
conformation' for a change in conformation, 'change-configuration' for a change in configuration, e.g. 
by an cpimerase or isomcrasc, 'change-other' for another type of change, such as a metal ion exchange, 
and bthcr' for any other chemical action. Types *add', Ycmove' and Tcui-seq' arc associated with a 
BIND-action-object to store related data. 
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A BIND-action-objcct is a choice cicmcm that can store nothing, with a choice of NULL, a 
BIND-objcct, or a site on a sequence using a Scxj-loc. The bbjcct' choice of the BIND-aciion-object is 
only relevant for the 'add* and remove* choices of the BIND-action-type. The BIND-object is meant to 
store a description of the chemical compound that is added or removed. An example would be a 
phosphate group that could be added by a kinase enzyme or removed by a phosphoryiase enzyme. The 
location' choice of the BIND-aciion-objcct is only relevant for the tut-seq' choice of the BIND-action- 
type. The Seq-ioc is meant to store the position(s) where a biological sequence is cut. An example 
would be the locations after which a restriction enzyme cuts DNA or the sites after which a protease 
cleaves in a protein. The choice of 'none' can be used for either 'add\ Ycmovc' or tut-seq' if 
•information that would otherwise be stored is not known. 

The BIND-action object also includes an optional result field to store the resulting molecule(s) 
from a chemical action as a sequence of BIND-objccts. For instance, if a molecule of DNA was 
methylated, the description of the methylated DNA could be stored in a BIND-objcct. If a protein 
molecule was cut at various locaSons, all resulting protein molecule fragments couid be described with 
the BIND-objcct sequence. With a sequence of interacting proteins where A binds to B, B binds to C, 
etc.. the result field storing the full chemical form of B in the A-B interaction, for example, could be 
used directly in the B-C interaction record. This allows the exact description of sequential chemical 
modifications on a biological sequence that would otherwise not be possible given the standard 
sequence representation alone. 

A Biostruc-feature-set that can contain residue or atomic level of detail differences in a 
molecule created by this chemical action is also present as a BIND-action object. The molecule that is 
different in this case is based on the direction of the chemical action. If the direction is molecule A to 
B, any information stored in the diff field would pertain to molecule B, not A. This field allows even 
small changes to molecules to be represented, as; in the example of a chemical action reducing a double 
ix>nd by adding two hydrogen atoms across it. The addition of the two hydrogen atoms could be 
..Tccordcd as differences on an atomic suructure. This information requires the presence of atomic level 
detail data for the molecule being changed. The diff field can also represent changes made to the 
substrate of the chemical action. In an example of a phosphate added to a protein on a specific tyrosine 
residue by a phosphokinasc enzyme, the diff field would simply be the position in the protein sequence 
of the tyrosine that was being changed. 

An optional BIND-signal object is included in the BIND-action object to store directional 
information related to chemical signal as it is found in cell signaling pathways. This dau is really a 
more general notion of kinetics describing signal u-ansduction. The signal could, for example, be the 
activation of proteins in a signaling cascade via phosphoryiui:-n such as in a MAP kinase pathway. 
BIND-signal object contains an enumerated ty|>e describing tne signal modification from a top-level 
viewpoint. Possible values are none', 'amplify', 'repress; 'auto-ampiify\ 'auto-repress', and 'other'. The 
direction of the signal is stored in the a-to-b boolean flag, which defaults to true. If a-to-b is true, the 
direction of signal is from molecule A to molecule B and vice versa. An optional RealVal-Units field 



can store the factor of signal amplification or repression if they occur. Signal amplification in the cell 
IS really just the recruiimenl of molecules one step further down m the pathway by the molecule at the 
current step. So. if molecule A activates molecule B by removing a phosphate in a signaling pathway 
and there is amplification at this step» in the celU molecule A activates many molecules of B causing a 
strengthening of the chemical signal by a measurable factor that may be stored. An optional free text 
description is available in the BIND-signal object as well. This field should contain some description 
of the signal action if bther* is specified in the 'action' field. 

Kinetic and thermodynamic data may also be optionally stored in the BIND-action object 
using the BIND-kinetics object. The BIND-kinetics object offers specified real value and text 
description fields for common kinetics (e.g. Michaelis-Menten) and thermodynamic values as well as 
providing a sequence of BIND-kinetics-other objects to store any other text or real number values that 
may be pertinent. A BIND-pub-set object is also present to store publications that relate to any of the 
information stored. All objects in the BIND-kinetics object are optional to allow any combination of 
values to be stored. 

Also in the BIND-action object, a link to a sequence of experimental conditions used to 
observe this chemical action is optionally provid(jd using a sequence of BIND-condition-dependency 
objects. The BIND-condition-dcpcndency objects reference previously defined experimental 
conditions by Interaction-id and Intcrnai-conditions-id number. In this way, any experimental 
condition in a daubase using this specification may be uniquely referenced. 

7, A BIND'State-descr object for storing information on chemical state of molecule A or B. 
See Figiire 9 for a diagrammatic view of this data type. The BIND-state-descr object stores a list of 
possible chemical states for molecules A and B in BIND-state-set objects as well as references to 
defined chemical states of A and B that arc rcquir<:d for the interaction to take place, in BIND-required- 
siatc objects. More than one possible state can be saved because certain molecules can assume 
multiple states. One example is a protein enzyme which may be multiply phosphorylated to bring 
about different enzymatic activity levels, depending on the phosphorylation level. All fields in the 
BIND-state-descr object are optional allowing any combination of data objects to be stored. A BIND- 
statc-set contains a sequence of BIND-state objects each numbered by an Intcmal-state-id (ISID) 
integer. Each BIND-statc object contains an optional enumerated list describing the general activity of 
the molecule, an optional sequence of BIND-statc-cause objects, an optional free text description, and 
an optional BIND-pub-set for storing publications related to this chemical state description. The 
'activity-level' list is a simple description and is purely subjective, but is still useful for discriminating 
various states of different activity, especially by a data visualization program which could colour 
molecules based on this information. 

The BIND-sutc-cause object can be usiul to uniquely reference previously defined chemical 
actions from this or other interactions that bring about this state. It contains an III> and an I AID. This 
functionality is very important in the specification because it allows full chemistry to be described 
when chemical actions and chemical states are uken together. Full chemistry means that all substrates. 



enzymes, products, bio-processed compounds etc. may be rcprcsenicd in full atomic level detail for all 
steps in a pathway. A certain chemical action can have a result (in the result' field of a BIND-action 
object) and a certain chemical state can reference the action that occurred to create it. In this way bi- 
directional linked lists can form networks that represent true chemical networks in a ccil. 
A Molecular Complex - BIND-Molecular-Compiex 

The BIND-Molecular-Complex object is the second of three top-lcvcl biological objects in the 
BIND specification. It is meant to store a collection of more than two interactions that form a complex. 
I.e. three or more BIND-objccts that can operate as a unit. In this way» it is useful to store knowledge 
of molecular complexes and as a shorthand for use when defining interactions and pathways (see 
BIND-pathway). Figure 10 provides a box diagram view of this data type. 

A BIND-Molecuiar-Complex object contains similar administrative information fields as a 
BIND-Intcraction object. A Molecular-Complcx-id (MCID) integer accession number is stored to 
uniquely identify molecular complexes. A BIND-pub-set is present to store publications that concern 
this molecular complex and a pnvate flag is provided to mark this record as private using the same 
rules as the private flag of the BIND-interaction record. 

Six other fields in the molecular complex store data directly relating to the complex. A Hescr* 
field optionally provides space for a human readable free text description of the molecular complex. 
The 'sub-num' field contains a BIND-moI-sub-num object that stores the number of sub-units (BIND- 
objects) in the molecular complex. The sub-unit number includes cither an exact integer using the 
hum' field or a fuzzy integer in the *num-fuzz' field. The fuzzy number is stored using an NCBI Int- 
fu2Z object which can store a number in a range, plus or minus a fixed or percentage amount^ or store a 
set of alternatives for the number. Using a fuzzy number, complexes can be stored even when the exact 
number of sub-units is not known. Examples of such complexes are actin filaments or other parts of 
the cytoskcleton and virus coat proteins, both of which typically form using repeated units of a certain 
protein. 

The BIND-Molccular-Complex object also includes a^'sub-units' field to store the actual sub- 
units of the complex as a sequence of BIND-mol-objcct data types. The BIND-mol-objcct is simply a 
wrapper for a BIND-objcct that allows the BIND-object to be numbered using a BIND-mol-object-id 
integer (BMOID). Numbering the sub-unit BIND-objects allows the BIND-mol-object-pair to reference 
them for topology, as discussed below. 

A primary component of the BIND-Moiecuiar-Complcx object is a list of Inter action-ids, 
which references previously defined interactions In a database. This means that most of the data for 
function^ state, location, etc. for a molecular complex is actually stored in BIND-Imeraction objects. 
This avoids some duplication of information. A boolean flag marks the interaction list as being ordered 
or not This should be true if the temporal order of interactions that form the complex is known and the 
IID list is ordered in that way. Ordering of sub-unit binding for some well studied biological 
complexes, such as the ribosomc, is known. 
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An optional sequence of BIND-mol-object-pair objects is present in the BIND-Molecular- 
Compiex object and is meant to store a two-dimensional topology of the molecular complex. A BIND- 
mol-object-pair object simply records a connected pair of BlND-mol-objects m the molecular complex 
by making a reference to two BMOID numbers of the sub-units that are connected. Together the 
5 BIND-mol-objecis, as nodes, and the BIND-mol-object-pair objects, as edges can describe the 

computer science concept of a graph. The topology information can allow a data visualization program 
to draw a representation of the actual shape of the complex. 

Because most of the data for complexes is referenced BIND-mteraction records, a certam 
amount of automatic data entry can be used. A list of sub-units and the number of sub-units can be 
10 automatically entered by fetching the data from the given list of interaction records. 

It can also be noted that a molecular complex can be defined if the pairwise interactions of 

1 which it is composed are not completely known. This can be done by creating a set of interaction 
:i objects with molecule A as a sub-umj, of the complex and molecule B as 'not-specified*. This is useful 
J since many preliminary studies of a molecular complex observe only that ceruin molecules interact, 
]i5 e.g. from gel data, but not how they interact. 

S A Pathway - BIND-Pathway 

- The final top-level biological object in the BIND specification is the BIND-pathway data type. 

=1 It describes a collection of more than two interactions that form a pathway, i.e. three or more BIND- 

t objects that are generally free from each other, but can form a network of interactions. Common 

4o examples include metabolic pathways and cell signaling pathways. See Figure 1 1 for the box diagram 

2 for this data type, 

BIND-Pathway object contains similar administrative information fields as a BIND- 
Interaction and a BIND-Molecular-Complex. Two other fields in the BIND-pathway object store 
information describing the pathway. A sequence of Interaction-ids that reference previously defined 

25 interactions thai make up this pathway is stored. Extra descriptive information regarding the pathway is 

stored using a BIND-path-descr object. This object can optionally store free text describing the 
pathway and an optional sequence of BIND-cellstage objects that represent the phases of the cell cycle 
in which this pathway is in effect. Parts of the pathway may be constitutively present in the cell, while 
other parts that complete the pathway and allow activation may only be expressed at certain limes 

30 during the cell cycle. 

Other BIND ASN.l objects 
PubUcation Set 

A BIND-pub-sei is used to hold all publications in BIND. It contains a list of BIND-pub- 
objects and a dispute flag. A BIND-pub-objecit contains an optional free text description of the 
35 publication, an enumerated opinion of the publication field and a NCBI Pub object. The description 

field may hold any text data pertaining to the publication referenced by this object. The opinion field 
may hold the values: 'none; 'support' and 'dispute'. It is meant to convey the general opinion of the 
referenced publication in regard to the information in the ASN.l object that contains the BIND-pubrsct. 



The NCBI Pub objcci is used to store most of the data m PubMcd and can represent almost any 
publication. It should be used to store a reference to PubMed whenever possible usmg either a Medline 
Unique Identifier (MUID) or a PubMed unique identifier (PMID). 
Record Update 

If a record is updated in BIND^ a description of the update should be added to a BIND-update- 
objcct. This object contains a NCBI Date object and a text description field. The description field may 
contain any information that a database implementation decides to store, but it should be complete and 
stored m a standard and automatic way within each implementation so that it can be easily parsed. Any 
information may be stored up to and including the entire previous record in ASN.l value notation. 
This data is not meant to be human entered but rather maintained as a machine generated audit u-ail of 
.any changes made. 

Data exchange and data cross-referencing 

Data exchange systems and database management data structures have been included in the 
specification as powerful tools to make implementations more robust. BIND-Submit is the top-level 
object for data exchange while the cross referencing system involves many separate top-level data 
objects. 

Data exchange - BIND-Submit 

The BIND-Submit objcci can be used to exchange any number of the top-level data types in 
the BIND specification, BIND-Interaction, BIND-MoIccular-Compiex» and/or BI^^D-Pathway objects. 
BIND-Submit stores an NCBI Date object, an optional BIND-Database-Site, a BIND-Submitier object, 
an optional BIND-Submit-id integer for identifying this submission, and fields for optionally storing 
BIND-Interaction-set, BIND-Complcx-set, and BIND-Path way-set objects. 

A BEND-Databasc-site is a description of a database site. This object could be used if data 
was being submitted to BIND from any other database. It contains free text description of the database 
site, usually the database name. Also present is a text field for database country of origin and an 
optional field used to store the World Wide Web Universal Resource Locator (WWW UKL) of the 
homepage of the database on the Internet. An optional NCBI Pub object can store a Medline reference 
for this database. 

A BIND-Submittcr object contains information about a submitter to a BIND database. BIND- 
Submitter stores a BIND-Contact-info object which contains information about a person. A ^'hold until 
published" boolean flag is present which defaults to false to allow data submission prior to pubiicauon. 
Also present is an optional enumeration of possible submission types, cither 'new\ \ipdate; Yevision\ 
or bthcr'. An update is a change by an author while a revision is a non-author update. A free text field, 
\oor, stores the name and version of the tool used to submit the record. 

Personal contact information may be kept separate from BIND records to keep the submitter 
and ownership information anonymous and protected from improper use. 

Actual records are stored in the BIND-Submit object in data set dau types. The BIND- 
Interaction-sct, BIND-Complex-sct and BIND-Paihway-sei arc all present in the BIND-Submit object 
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and arc analogous in that they optionally store the date on which the set was collected, optionally the 
database from which the record set originates using a BIND- Database-sue, and the respective sequence 
of records. 

Cross-Referencing the Data 

Since the BIND specification describes biological data from interactions to pathways and 
networks of pathways, the information space represented resembles a largely undirected graph with 
molecules as vertices and their interactions as edges. Cross-referencmg information allows the graph 
to be easily traversed using simple indexed lookup techniques. If cross-referencing were not used in a 
system such as this, all records would have to be exammed at each traversal of the data space. Instead 
of creating traditional large, unwieldy indexes and tables to speed the traversal process, ASN. i objects 
arc directly specified to store cross-reference information. This represents an object oriented database 
index system. Each BIND database accession number as well as NCBI GI, MUID and PMID and 
SLRI DI accession numbers has its own associated cross-reference object. This information may be 
easily exported and used by other databases to link their sequence or structure data back to BIND. 

When updating cross-reference information, only one level of the graph is traversed, so as not 
to make the index overly complicated. Any time one of the three top level objects is created that 
contains a cross-referenced accession number, (he BIND-Cross-Ref object lists are updated. In this 
way, any search using a cross-referenced accession number instantly retrieves all of the interaction, 
complex and pathway records that contain it. 

The interaction cross-reference data is stored in a BIND-Iid-Cross-Ref object. This data type 
contains the IID of the interaction being cross-referenced in this object. The lids*, 'pids* and 'mcids* 
fields contain a list of IIDs, PIDs and MCIDs, respectively of interactions, pathways and complexes 
that contain this interaction. A BIND-Submitter object is included to privately store submitter 
information for every interaction. 

Molecular complex cross-reference information is stored in a BIND-Mcid-Cross-Ref object 
which is completely analogous to the BIND-Iid-Cross-Rcf object. 

Pathway cross-reference data is contained in a BIND-Pid-Cross-Ref object. This object oidy 
keeps a list of submitters for each pathway record. Since no other objects can reference a pathway 
record, the BIND-Pid-Cross-Ref object does not contain references to other records. 

The GUDl cross-reference information is stored in a BIND-Cross-Ref object. This object 
links a biological sequence to a list of interactions, molecular complexes and pathways that contain it, 

PMID/MUID cross-reference data is maintained in a BIND-Pub-Cross-Ref object. This cross- 
reference scheme is analogous to thatof GI/DI accession numbers. 

The full cross-reference system allows quick and easy searching of the database by any of the 
five indexed accession numbers. 
Exported data types 

Typical ASN.l data specifications make certain data types available for use by other ASN.l 
specifications by exporting them. BIND currently exports the top-level data types BIND-Submit, 
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BIND-Intcraction. BIND-lntcraciion-set, BIND-Paihwa>\ BIND- Path way-set, BIND-Molecular- 
Compicx and BIND-Complcx-sct. 
Flat-file Record Format 

Many current biological data specifications arc available m a flat-file formal for use with 
5 simple fiat-file databases. Examples include the GcnBank flat-file format and the PASTA format for 

biological sequence representation. BIND can be made available in a flat-file database record formal 
that mirrors the BIND ASN.l specification. Therefore, BIND may include ASN.lc=> flat-file conversion 
software tools. 
Data Entry 

10 BIND may rely on the following different sources for data entry. 

1 . Manual data entry: 

Data is entered manually via web based forms handled by CGI scripts on a World Wide Web 
Server. This allows entry of data from individual computers on a users own time, from anywhere in the 
world. BIND indexers review and validate public entries as they arrive. Researchers can enter their 
15 data after they have finished an experiment. 

(i) Curated Data entry: 

Data that is already present in the literature will be entered into BIND. 

(ii) User data entry: 

When a paper about protein or DNA sequence or protein structure is about to be published, an 
20 author generally obtains an accession number to a database, such as GcnBank or PDB. An author of a 

paper containing information about biomolecular interactions, complexes, or pathway information, will 
obtain a similar accession number from BIND. A BIND indexer will validate the incoming data and 
issue an accession number. This follows the GenBank model. 

2. Automated data entry: 

25 Data gathering agents will gather data from various sources on the Internet. Possible examples 

include: 

A. NCBI's MMDB structure database that contains many protein multimcrs, with 
accompanying dcUiled atoniic interaction information, Web site: 
www,ncbi.nlm.nih.govyStructiire/ 
^0 B, DIP (Database of Interacting Proteins) contains many protein-protein interactions. Web 

site: http://ampere.mbi.ucla.cdu:8801/ 

C. FlyNcts - the drosophila interactome database - conuins protein-protein, protein-DNA and 
protcin-RNA interactions. Web site: http://gifts.univ-mrs.fr/GIFTS_home_page.html 

D. Ligand DB - compound database from Japan. Web site: 
35 www.gcnome.ad.jp/dbget/ligand.html 

E. Kiotho-anothcr compound database. Web site: 
www.ibc.wastl.edu/moirai/klotho/compound_list.html 

3. Dau entry direct from experimental systems: 



A separate instance of BIND, a BIND satellite database, can be used as a local repository lo 
store experimenul data as it is gathered, but before it is analyzed. Any data that is then used m a 
publication can then be transferred easily to the public database, A BIND satellite can download the 
current public database and merge it with local data. 

Examples of experiments that can be used to locate interactions include: 

A. Immuno-prccipitaiion 

B. Affinity chromatography 

C. Yeast two hybrid 

D. DNA footprinting 

E. Rcconstitution experiments (using various detection tools such as FRET, hydroxyl radical 
footprinting, isotope exchange combined with mass spectroscopy, and fluorescence anisotropy) 
Accessing BIND Data 

BIND can be accessed via„a user-fricndiy Web interface on the Internet and anyone using a 
current web browser can access BIND data. BIND records may be searchable by Interaction ID (iid). 
Molecular Complex ID (moid). Pathway ID (pid), NCBI gi, and PubMcd or Medline ID. The data can 
be text indexed and searchable using keywords. There is a BLAST interface to BIND. 
Visualization of BIND data: 

Web based Java applets that will dynamically represent pathways and molecular complexes 
have been designed. These form the preferred front end of the BIND system. For example, when a 
pathway is graphically represented, the image is mouse clickable so that information about the record 
and other records in the database will be easily accessible. 
Implementation 

This section gives an overview of the BIND database. The implemenution allows data entry 
and data retrieval supporting the full BIND 1.0 ASN.l specification. Progranmied fully using the C 
programming language for maximum speed and compatibility, a BIND application programming 
interface (API) has been written to allow applications to easily ^sc data in the BIND database. The 
API makes use of two C libraries, the NCBI Toolkit (ftp://ncbi.nim.njh.gov/tooibox) for ASN.l 
handling and more and the CodeBase (http://www.sequiter.com/) database library for a database 
implementation. Using this API, wcb-bascd applications have been developed for data entry, retrieval 
and management. All data is entered and retrieved using web-based forms generated by CGI programs 
written in C. Interaction data is entered using this web-based user interface. 

The BIND database uses the Seqhound database system as a resource. Seqhound is a mdrror 
of GenBank, the NCBI taxonomy database and the PDB (Bernstein ct al., 1978) data in NCBI MMDB 
form (Hogue ct aL» 1996). Seqhound derived data allows BIND to quickly and easily use sequence, 
taxonomy and 3D molecular structure information for validation and for information retrieval. 

data visualization and data mining systems have been designed for the database 
implementation. The spider will traverse BIND searching for new signaling pathways. It will traverse 
all pathway cross talk links looking for signaling routes that are not present in BIND. The results of 



this search will be potentially unknown cellular signaling pathways. The proteins of these new 
pathways can also be examined to sec if they contain known binding domains, such as SH2 and SH3 
domains, which will increase the likelihood of pathway cross talk. The short list of newly found 
potential pathways can then be experimentally evaluated. 

With the current information garnered by genomic sequencing projects, homologous cell 
signaling pathways can be found in other organisms by knowing all of the gene products in a pathway 
m a related organism. Even between non-related organisms, certain 'housekeeping' pathways should 
not be expected to differ much. 

BIND can be used to find networks of biological signaling pathways whose topologies can 
support signal properties that simple pathways can not. It has been shown that certain kinds of 
signaling networks have properties that cannot be seen with simple signal pathways. Storing of 
information, large-scale signal attenuation and signal control are some of these properties. It has been 
supposed that memory can have a basis in the long term storing of information in ^certain signaling 
pathways. (Bjalla, US and Iyengar R, Emergent Properties in Signaling Networks, Science 
283(5400):381.7, Jan 15, 1999) 

BIND can also be used to identify a biomolecular interaction that is similar to a reference 
biomolecular interaction stored in BIND. The user interface allows the user to initiate a similarity 
search for each molecule in the test biomolecular interaction. The results can be screened by selected 
taxonomy. A putative biomolecular interaction is then assembled to create a test record. The BIND 
database is then examined to match the test record with the records therein to produce a matching 
record containing a reference biomolecular interaction that matches the test biomolecular interaction. 

Examples of a BIND specification are attached as Appendix A and Appendix B. 

Having illustrated and described the principles of the invention in a preferred embodiment, it 
should be appreciated to those skilled in the art that the invention can be modified in arrangement and 
detail without departure from such principles. We claim all niodifications coming within the scope of 
the following claims. 

All publications, patents and patent applications referred to herein are incorporated by 
reference in their entirety to the same extent as if each individual publication, patent or patent 
application was specifically and individually indicated to be incorporated by reference in its entirety. 
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APPENDIX A 

"SRevision: 0.5 $ 
__ 

— BIND (Biomoiecular Interaction Network Database) Interaction Record 
« by Gary Bader. Oct. 21, 1998 

Hogue Lab - University of Toronto Biochemistry Department 
Samuei Lunenfeld Research Institute, Mount Sinai Hospital 

BIND-Interaction DEFINITIONS 
BEGIN 

EXPORTS BIND-Interaction, BIND-interacti on-set, 
BIND-Pathway, BIND-pathway-set, 
BIND-Molecular-Compiex, BIND-complex-set; 

IMPORTS Date FROM NCBI-General 
Bioseq FROM NCBI-Sequence 
Submtl-block FROM NCBf-Sumit 
Pub FROM NCBl-Pub 
Org-ref FROM NCBl-Generai 
Seq-Ioc, Seq-id FROM NCBI-Seqloc 
Biostruc FROM MMDB 

Biostruc-graph FROM MMDB-ChemicaKgraph 
Chem-graph-pntrs FROM MMDB-Features; 

„ ******«««**i^*** 

— * Interaction * 

*************** 

^. *♦***♦**♦*♦****»♦♦*** 

— A set of interactions 

*********«*i|r****)»r**** 



BIND'interaction-sct SEQUENCE OF BIND-Interaction 

>_ *************************** *********** ********************** ***♦»♦*♦*»♦*♦ 

— A BIND-Interaction record can store all of the details of the interaction^ 

— between any two molecules (or atoms). 

— Field description for BIND-Interaction 

♦♦♦*♦»♦»♦•♦♦»♦»*»»♦»**♦*•*♦♦♦*»♦*«**»«♦* ^ 

— date = date of record entry 

*- updates = a list of updates for the record 
tid = inceraaion accession number 
pids = list of pathways that this interaction is involved in 

— mcids = list of molecular complexes that this interaction is involved in 

— a = molecule 'a' interacts with... 

— b = molecule 'b* 

*- descr » description of interaction 

*- source empirical evidence references 

— sub = contact information of the submitter and general information about 

this submission 

— priv = TRUE if this interaction is private 

************************************************************* 



BrND-Inicraciion SEQUENCE { 
date Date, 

updates BIND-updaie-set OPTIONAL, 
iid Interaction- id, 

pids SEQUENCE OF Pathway-id OPTIONAL, 

mcids SEQUENCE OF Molecuiar-CompIex-id OPTIONAL, 

a BIND-object, 

b BIND-object, 

descr BIND-descr, 

source BIND-pub-set, 

sub Submit-biock, 

priv BOOLEAN DEFAULT FALSE 
} 

interaction-id ::= INTEGER 
Pathway-id INTEGER 
MoiecuIar-Complex-id INTEGER 

- * Record Update * - 

— A set of updates for a record 



BIND-update-set ::= SEQUENCE OF BIND-update-object 



_**************************♦******* 

— An update for a record 

— Field description for BIND-update 

— date = date of this update 

— descr = text description of update 

^ ********************* 



BIND-update-object ::= SEQUENCE { 
date Date, 
descr VisibleString 
} 

mit^*i^mm*9**Tt*m*m****mm*m 

~ * Biomolecuiar Object * 
*********************** 

^ ****************************** ********************* *********m ******* 

— Any chemical object 

— Field description for BIND-object 



— id = a choice of possible pointers (usually to accession numbers 

of other oat bases) for different types of molecules that may 
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inieracl. 

« Choice of: none, protein, dna, ma, ligand {any other type of 

chemical compound), interaction, molecuiar complex 

— origin = material source, org-ref for biological material, text 

— description for chemical compound of non biological origin 
" seq ~ space for sequence, if it is not in a public database 

— struc = space for complete structure, if not in public database 
descr - optional text description of this object 



BIND-objcct SEQUENCE { 
id CHOICE { 

none NULL, 

interaction Interaction-id, 

complex Molecular-Complex-id, 

protein BIND-id, 

dna BJND-id, 

ma BIND-id, 

ligand BIND-ligand-id 

}. 

origin CHOICE { 

org Org-ref, 
chem VisibleString 
}, 

seq Bioseq OPTIONAL, 
struc Biostruc OPTIONAL, 
descr VisibleString OPTIONAL 
} 



" * Identifiers * 

General sequence or domain identifier 
" Field description for BINE>-id 

- gi = NCBI main accession number 

- dt = domain accession number {from the domain split database) 

- other = open field for other possible NCBI defined pointers ^ 

- NOTE: there is a field for gi in 'other Seq-id', but it should not be used 

- in this specification 



BIND-id SEQUENCE { 
gi Gcninfo-id, 
di Domain-id OPTIONAL, 
other Seq-id OPTIONAL 
} 

Geninfo-id ::= INTEGER 



Domain-id ::= INTEGER 



Pointers to various ligand databases (needs to be exp;anded e.g. CAS reg.#) 

Field description for BIND-Iigand 
♦»»»••••••*♦♦♦♦♦•♦*•**••••**•♦•** 

- internal = an accession number describing an iniemaily kept structure of a 

chemical compound (composite database of LIGAND DB and IClotho DB) 
-- othcr-db = generic pointer to any other database (e.g. Japanese ligand db) 

Contains the name of the database, an integer pointer and a string 

pointer. 



BIND-ligand-id ::= CHOICE { 

internal Intema!-Iigand-id, 
other-db BIND-other-db 

} 

Intemal-ligand-id INTEGER 

BIKD-other-db ::= SEQUENCE { 
dbname VisibleString, 
intp INTEGER OPTIONAL, 
strp VisibleString OPTIONAL^ 

} 



**************** 

— * Publications * 

**************** 

*********************** ****** 

— This holds a publication set 

" Field description for BIND-pub 

_ ****************************** 

— disputed = TRUE if the interaction is disputed in the pub-set 
-- pubs a sequence of pub-objects 

^^^^i^^0*m**** ********************** 



BIND-pub-set SEQUENCE { 

disputed BOOLEAN DEFAULT FALSE, 

pubs SEQUENCE OF BIND-pub-object ^ 

} 

^^^^^00^^^^ 0.^^0m* ***************** ************************* 

— A publication 

Field description for BIND-pub-object 

descr = optional text description of this object 

— opinion = does this publication support or dispute the data 
pub = publication reference 



BIND-pub-object ::= SEQUENCE { 

descr VisibleString OPTIONAL, 
opinion ENRJMERATED { 
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none (0), 
support ( 1 ), 
dispute (2) 
}, 

pub Pub 

) 



" * Interaction Description * 

— Full description for an interaction 
Field description for BIND-descr 

— simple-descr = text description 

— loc = description of location of interaction 

— cond = binding conditions/experimental conditions 

— cons = conserved sequence ciement comment 

— interaction = graph of the actual atoms or sequence elements 

involved in the intcracttdn 

— action = list of chemical actions that can occur in this interaction 

— state = list of active states of 'a* and 'b' as well as required 

— activity states for interaction to occur 



BIND-descr ::= SEQUENCE { 

simple-descr VisibieString OPTIONAL, 
ioc BIND-loc OPTIONAL, 
cond BIND-condition-set OPTIONAL, 
cons BIND-cons-seq-set OPTIONAL, 
interaction Biostruc-graph OPTIONAL, 
action BIND-acti on-set OPTIONAL, 
state BIND-state-descr OPTIONAL 
} 



— * Interaction Location * 

y 

— Top level interaction location descriptor 

— Field description for BIND-loc 

— gen-loc = general cellular location where this interaction takes place 

(computer readable) 

— spec-Ioc = a fully specific location of the interaction 

— (human readable) 

— source = empirical evidence references 

— descr = text description 

BIND-loc : ~ SEQUENCE { 

gcn-loc BIND-gen-loc-set, 

spcc-ioc BIND-spcc-loc-sct OPTIONAL, 



source BlND-pub-set OPTIONAL, 
descr VisiblcString OPTIONAL 
} 



^0 ^^0^^^ 0 0^^*0 ************** mm** m****f$*** ******************** ****** 

— General start and end locations for an interaction 

-- Field description for BIND-gen-loc-set 
__************************************** 

— start = general location where this interaction takes place 

— end = general location where this interaction ends 
" descr = text description of this object 

^ ********************************************************************** 



BIND-gen-loc-set ::= SEQUENCE { 
start BIND-gen-loc, 
end BIND-gen-Ioc OPTIONAL, 
descr VisibleString OPTIONAL 
} 



m* ***************************************************** *************** 

— General cellular location where this interaction takes place 

— Field description for BIND-gen-loc 

^ ********************************** 

— An enumeration of general cell locations 

— extracellular = extracellular 

— cytoplasm = in cytoplasm 

— organelle = in an organelle 
nucleus - in nucleus 

— membr-ccll-cyt = on the cytoplasmic side of the cell membrane 
membr-cell-in = in the cell membrane 

membr-cell-ext = on the surface of the cell membrane 

— * membr-outer-peri = on the periplasmic side of the outer membrane 
-- membr-outer-in ~ in the outer membrane 

— membr-outer-ext - on the surface of the outer membrane 
cellwall-cell = on the inside surface of cell wall 

cell wall-In = in the cell wall 

ce!Iwall-ext = on the outside surface of the cell wall 

— other - other location - see text description in BIND-gen-Ioc-^et 

********************************************************************** 

BIND-gen-loc ENUMERATED { 
not-specified (0), 
extracellular ( 1 ), 
cytoplasm (2), 
organelle (3), 
nucleus (4), 
membr-cell-cyt (5), 
membr-ccll-in {6)» 
membr-cell-ext (7), 
membr-outer-peri (8), 
membr-outer-in (9 ., 
membr-outer-ext (10), 

cell wall-cell (in. 
cellwall-in (12). 
ceilwall-ext(13). 



oihcf(255) 
} 



— Specific start and end locations for an interaction 

« Field description for BIND-spec-loc 
»******«**•»********«* *****»***«»** 

— Stan = specific location where this interaction takes place 

— end = specific location where this interaction ends 
„***•*******«*********************************«*»******* 



BIND-spec-ioc-set SEQUENCE { 
start BIND-spec-loc, 
end BIND-spec-loc OPTIONAL 

} 



^ *************************************** ******]^*«««*i^«i^i^ 

— Specific location of interaction with respect to a cell 

— Field description for BIND-spec-loc 

*********************************** 

— location = text specific location 

~ descr = text location further description 

— cell-type = text cell type 
♦*♦♦♦**♦*♦***♦**»**♦*♦**•♦♦**♦»****♦*♦**♦♦«**»♦«♦»***** 



BIND-spec-loc SEQUENCE { 
location VisibleString, 
descr VisibleString OPTIONAL, 
cell-type VisibleString OPTIONAL 
} 

************************** 

— * Interaction conditions * 

************************** 

********************************** 

— A list of experimental conditions. 

*****•*«»««**•*******«******«»**** 

BIND-condilion-set SEQUENCE OF BIND-conditions 

******************************************************* 

— An experimental condition that has been used to observe 
~ this interation. Interaction must be reproduceable 

— using this information. 

-- Field description for BIND-conditions 

************************************* 

~ conditions = list of possible experimental conditions 

— system = experimental system used 
-- descr = text description 

— source = empirical evidence 



BIND-conditions :~ SEQUENCE { 

conditions ENUMERATED { 
in-vitro(O), 
in-vivo(l), 
other(255) 
}. 

system VisibleString OPTIONAL, 
descr VisibleString OPTIONAL, 
source BIND-pub-set OPTIONAL 
} 



^^^m^^ ******* 

— * Interaction conserved sequence * 

^^^^mm**** ******************** ****** 

Conserved sequence set 

— Only relevant for biological sequences^ 

— Derived from alignment information. 

— Fieid description for BIND-cons-seq-set 

m*****m* *********************** ******** 

a = conserved sequence of 'a' 

— b = as above, for 'b* 

^ ^it^itt^mm*** ********************************** *m****m*** 



BIND-cons-seq-set :~ SEQUENCE { 

a BIND-conserved-seq OPTIONAL, 
b BIND-conserved-seq OPTIONAL 
} 



^ i$i^m***** ******************************* ******ifm* ****** 

~ Conserved sequence 

~ Field description for BIND-conserved-seq 

^ ***************** ********************** 

— seq-el = these sequence elements have been shown to be conserved 

— descr = further text description 

— source = empirical evidence 

^ ****************************** 9******************0**** 

BIND-conserved-seq ::= SEQUENCE { 
seq-e! Seq-loc, 

descr VisibleString OPTIONAL, 
source BIND-pub-set OPTIONAL 
} 



~ * Interaction chemical action * 



" A set of chemical actions. 

" Field description for BIND-action 

m**m^******i$m**** ****** ********** 

max- laid = the highest iaid used so far in this set 
— actions = set of BIND-action objects 

******************************* ***^*i$}^m^m^^^^tt^^^ 



BfND-actton-set SEQUENCE { 
max-iaid Intemai-Action-id, 
actions SEQUENCE OF BIND-action 
} 

******************** ************m******i^*it^^^^^^^ 

- A chemical action. 

~ Field description for BfND-action 
" iaid - ifd-intemai action id 

- (unique for each action in a BIND-action sequence in 
an interaction) "* 

- descr - further text description 

- a-on-b = TRUE if 'a' is acting on 'b', FALSE for opposite 
~ action = choice of none, add, remove, change, cut, other 

(contains an object and a location for each action) 
" kinetics - chemical action kinetics 

- source - empirical evidence 

- ******^*****************************m********^^^^ 



BIND-action SEQUENCE { 
iaid Intemal-Action-id, 
descr VisibleString OPTIONAL, 
a-on-b BOOLEAN DEFAULT TRUE, 
action CHOICE { 

none NULL, 

add BIND-action-object, 

remove BIND-action-object, 

change BIND-action-object, 

cut BIND-action-object, 

other BIND-action-object 

kinetics BIND-kinetics OPTIONAL, 
source BIND-pub-set OPTIONAL 
) 

Intemal-Action-id INTEGER 

„ ************* ******m**0m********m****^w***if**m*m^^ 

" Object used by BIND-action. 

- Field description for BIND-action-object 

- what - object that is being acted upon (for none, add, 

- remove, cut, other) 

to object that this was changed to (for change only) 

- where - location of action 

- descr = optional text description of this object 



BIND-action-object ::= SEQUENCE { 
what BIND-object, 
to BFND-objcct OPTIONAL, 
where Chem-graph-pntrs, 
descr VisibleString OPTIONAL 
} 



^^t^m^^nmw.*^*^ ************************** **^** 

Chemical kinetics 

Field description for BIND-kinetics 
^_ ♦**♦»•♦♦***************•*********** 

-- descr = optional text description of this object 
kd = dissociation constant of interaction 

— km = Michaelis-Menten constant 

— vmax = max. velocity of reaction 

— conc-a = concentration of 'a' object 

— conc-b = as above, for 'b' 

— temp = temperature of the interaction^ystem (observed) 

— pH = pH of the interaction system 

— half-Iife-a = 1/2 life for 'a* 
~ half-Iife-b = 1/2 life for 'b* 

« buffer = buffer text description 

— other = any other kinetic related values (e.g. kl, k2...) 

— source = empirical evidence 

^^^^m^mmm****** *************************** ********** 



BIND-kinetics SEQUENCE { 

descr VisibleString OPTIONAL, 
kd RealVal-Units OPTIONAL, 
km Real Val-Units OPTIONAL, 
vmax RealVal-Units OPTIONAL, 
conc-a RealVal-Units OPTIONAL, 
conc-b RealVal-Units OPTIONAL, 
temp RealVal-Units OPTIONAL, 
ph RealVal-Units OPTIONAL, 
half.life-a Real Val-Units OPTIONAL, 
half-life-b Real Val-Units OPTIONAL, 
buffer VisibleString OPTIONAL, 
other SEQUENCE OF BIND-kinetics-other OPTIONAL, 
source BIND-pub-set OPTIONAL 
} 

BIND-kinetics-other SEQUENCE { 
descr VisibleString, 
value RealVal-Units 
} 

^ *m*0m********** ******************* ****************** 

— A Real Number 

— Field description for RealVal 



— Basic scientific notation 

- scaled-real-valuc • lO^scale- factor) 



units = string value of the units involved, e.g. mL M, etc. 



RealVal-Units SEQUENCE { 

scale-factor INTEGER, 
scaled-real-valuc REAL, 
units VisibieStringOPTIONAL 
} 



— * Interaction object chemical state * 

„ ********************************** 

— Chemical state and required chemical state for objects 'a' and *b* 

— Field description for BIND-state-descr 

**m******m* ************* *******^****m* 

~ a = list of possible activity states for 'a' 

" a-required-state - the state that 'a* is required to assume before interaction 
takes place. 

— b- list of possible activity states for 'b* 

— b-required-staie - the state that 'b' is required to assume before interaction 

lakes place. 



BIND-state-descr SEQUENCE { 

a BIND-state-set OPTIONAL, 

a-required-state BIND-required-state OPTIONAL* 

b BIND-state-set OPTIONAL, 

b-required-state BIND-required-state OPTIONAL 

} 



*************** *******************m********»^*t******* 

— A set of chemical states 

— Field description for BIND-state-set 

i^m***0******m*********************** 

~ max-isid = highest Intemal-State-id used in this set 

— stales = list of possible chemical states 

********************************** *****m***<^*** ****** 



BIND-state-set SEQUENCE { 
max-isid Intemal-Statc-id, 
states SEQUENCE OF BIND-state 
} 

Intemal-State-id INTEGER 
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— A chemical state 

-- Field description for B[ND-state 

0^i^^^mmmm**m*m*** *************** 

— isid = Iniemal-Slaie-id (unique for each state in a BIND-siaie-set) 

— activity = activity of molecule 

— activity-level = integer level of activity* baseline activity=0 

— cause = sequence of actions that caused current level of activity 

— descr = further text description 

source = empirical evidence for this slate 

^_ ^m^*********^*^ *********************** ********m******^'^m*m *********** 



BIND-state SEQUENCE { 
isid Intemal-State-id» 
activity ENUMERATED { 

none (0), 

active ( 1 ), 

inactive (2) 

}, 

activity-level INTEGER OPTIONAL, 

cause SEQUENCE OF BIND-state-cause OPTIONAL, 

descr VisibleString OPTIONAL, 

source BIND-pub-set OPTIONAL 

} 



■^^m ********************************************** ************ 

— Cause of a chemical state 

— Uniquely locates a chemical action by iid then by int-ac-id, 

— Field description for BIND-state-cause 

************************************** 

— from-iid = chemical action is from this iid 

— cause = chemical action id number that caused this activity 
*********************************************** ***i^^0*^^^^^^^ 



BIND-state-cause :~ SEQUENCE { 
from-iid Interaction-id, 
cause intemal-Action-id 
} 

^ *^*** **************************************** m* 

— A required chemical state 

— Field description for BIND-required-statc 



— isid = Intemai-State-id of the required state 

— source = empirical evidence 
*********************************************** 



BIND-required-siate SEQUENCE { 
isid Iniemal-State-id, 
source BFND-pub-set OPTIONAL 
} 
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— • Molecular-Complcx * 
A set of Molecular Complexes 



BIND-complex-set ::= SEQUENCE OF BIND- Molecular-Complex 



■***m***i¥ ****************** **m**********^*m0*****^*^^t^mm*^^ 

-- A molecular complex record 

— Field description for BIND-molecular-complex 

*******mm****mm***************mm*^*m******** 

— date = date of record entry 

— updates = a list of updates for the record 
mcid = molecular complex accession number. 

— descr - text description of interaction 

sub-num = total number of sub-units in this complex 
sub-units = list of pointers to the actual sub-units 

— interaction-order ~ the order of interactions that take place to form 

this complex. 

" complex-assembly a chemical graph of the interaction complex 

— (with molecules as nodes) 

— source = empirical evidence references 

" sub = contact information of the submitter and general information about 
this submission 

— priv = TRUE if this complex is private 



BIND-Molecular-Complex ::= SEQUENCE { 
date Date, 

updates BrNE>-update-set OPTIONAL, 

mcid Molecular-Comp lex-id, 

descr VistblcString OPTIONAL, 

sub-num INTEGER* 

sub-units SEQUENCE OF BIND-object, 

interaction-order SEQUENCE OF Interaction-id, 

complex-assembly Bicstruc-graph OPTIONAL, 

source BIND-pub-set, 

sub Submit-block, 

priv BOOLEAN DEFAULT FALSE 
} 



^ *********************** *********^ 

— * Biomolecular chemical pathway * 

****************** 0**********1^*** 
__ ************^**************** 

— A set of Molecular Complexes 
**************************** 



BIND-pathway-set ;~ SEQUENCE OF BIND-Pathway 
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~ A pathway record. 

" Field descripiton for BIND-pathway 

*mm*****m******^** ****** ********** 

— date - date of record entry 

— updates - a list of updates for the record 

— pid - pathway accession number 

— pathway = a collection of interactions and signal modification objects 

— descr = descriptors for a pathway 
source = empirical evidence references 

— sub = contact information of the submitter and general information about 

this submission 
priv = TRUE if this pathway is private 

*************************** ***********9****m^0 0i^m********** 



BIND-Pathway ::= SEQUENCE { 
date Date, 

updates BIND-updatc-set OPTIONAL, 
pid Pathway-id, 

pathway SEQUENCE OF BIND-path way-object, 
descr BIND-path-descr, ^ 
source BIND-pub-set, 
sub Submit-block, 

priv BOOLEAN DEFAULT FALSE 
} 

— One node in a pathway graph 

— Field description for BIND-path way-object 

****m^***m*t^^***mm********* ****** ******** 

— iid = interaction id reference 

— signal = pathway signal change mediated by this iid. 

« ************************ *******'*********^**********^**^i^^^^ 

BIND-pathway-obJect SEQUENCE { 
iid Interaction- id, 
signal BIND-delta-signal 



„****************************** ************4i*****^*********^ 

— A chemical signal change 

— Field description for BIND-delta-signal 

********************************** rn**** 

— action - signal modification 

— factor = the factor of the amplification or the repression 

— descr - further text description 

****************************************** 1^********1^**^^**^ 



BIND-delta-signal SEQUENCE { 
action ENUMERATED { 
none (0), 
amplify (1), 
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repress (2), 
auto-amplify-a(3), 
auto-repress- a( 4), 
other (255) 
h 

factor RcaiVal-Units OPTIONAL, 
descr VisibleString OPTIONAL 
} 

— Pathway description 

Field description for BIND-path-descr 

— descr - text description of pathway 

— ceJi-cycIe = if applicable, stage of a cell cycle that this pathway 

is in effect 

— developmental-stage = developmental stage of an organism, 

if applicable, that this pathway is in effect 



BIND-path-descr ::= SEQUENCE { 

descr VisibleString OPTIONAL, 
cell-cycle BIND-cellstage OPTIONAL, 
deveiopmental-stage BlND-devstage OPTIONAL 

} 



— Cell cycle stage 

— Field description for BlND-cellstage 
^^n*** ************************ ******* 

— phase = phase of cell cycle 

~ descr — text description of cell stage 

^^^^^^rn^^mm*^************************ ****************************** 

BIND-cellstage SEQUENCE { 
phase ENUMERATED { 
none (0), 

constitutive (IX >' 

interphase (2), 

division (3), 

gl (4). 

s(5). 

g2{6), 

mitosis (7), 

prophase (8), 

promeuphase (9), 

mctaphase (10). 

anaphase (11), 

telophase (12), 

cytokinesis (13), 

meiosts(l4), 

prophase] (15), 

ieptotene(!6), 

zygotene (17), 

pachytene (18), 
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diplotene(I9)/ 
diakinesis (20), 
Tnetaphasel(21X 
anaphase 1 (22\ 
teiophasel (23), 
meiotic-cytokincsis (24), 
prophase2 (25), 
metaphasc2 (26), 
anaphase2 (27), 
telophasc2(28), 
meiotic-cytokinesis2 (29), 
other (255) 
}. 

descr VisibieString OPTIONAL 
} 

— Organism developmental stage 

— Field description for BIND-devstage 

^ >^^*m ******* 

— Stage = text description of developmeo^i stage 



BIND-devstage ::= SEQUENCE { 
stage VisibieString 
} 



„******************************** 

— * GI/DI cross reference record * 

**************************** ***^ 

— ************************************************* 

~ Cross reference for gi/di searching 

— Field description for BIND-Cross-Ref 

**************************** ****m^^^ 

" gi number 

— di = di number 

— iids = list of interactions that this gi is involved in 

— pids — list of pathways that this gi is involved in 

— mcids = list of molecular complexes that this gi is involved in 

— ♦****•♦**♦*****•»♦♦*****♦♦♦«♦**•*♦*♦♦*•♦ 



BIND-Cross-Ref::- SEQUENCE { 
gi Geninfo-id DEFAULT 0, 
di Domain-id DEFAULT 0, 
iids SEQUENCE OF Interaction-id, 
pids SEQUENCE OF Pathway-id OPTIONAL, 
mcids SEQUENCE OF MoIecular-CompIex-id OPTIONAL 
} 



- * PMfD/MUID cross reference record • 
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— Cross reference for pmid/muid searching 

— Field description for BIND-Pub-Cross-Ref 
-- uid = muid or pmid 

— jids = list of interactions with this publication as a reference 
pids = list of pathways with this publication as a reference 

— mcids = list of molecular complexes with this publication as a reference 



BIND-Pub-Cross-Ref ::= SEQUENCE { 
uid INTEGER, 

lids SEQUENCE OF Interaction-id, 

pids SEQUENCE OF Pathway-id OPTIONAL, 

mcids SEQUENCE OF MoiecuIar-Complex-id OPTIONAL 

} 



END 
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APPENDIX B 
"SRevision: 1.1 $ 

-- ♦♦^^^^^^^^ 

Biomolecular Interaction Network Database (BIND) 

— Data Specification 

— Interaction, Molecular Complex, Biological Pathway Data Structures 



~ Authors: Gary D. Bader, Christopher W.V. Hogue 
bader@mshri.on.ca hogue@mshri.on.ca 

- Hogue Lab - University of Toronto Biochemistry Department and the 
Samuel Lunenfeld Research Institute, Mount Sinai Hospital 

~ http://bioinfo.mshri.on.ca hogue@mshri.on.ca 

- REVISIONS 

- Revision 0. 1 - Oct, 2 1 , 1 998 

- Revision 0.5 - Feb. 2, 1999 (BIND web based data entry prototype) 

- Revision 0.6 - Feb. 26, 1999 (Feedback from Biophysical Soc. Conf.) 
« Revision 0,8 - May 3, 1999 

- Revision 0.9 • May 31,1999 

Revision 1.0 - June 7, 1999 (comments only added to 0.9) 
-- Revision 1. 1 - Some changes and additions 
Removed iid from BIND-object 
Changed Pub to pub-set in database-site 

Moved BIND-cellstage object definition higher in specification (aesthetic change) 
Removed OPTIONAL from activity-level in BIND-staie 

Added 'not-specified' to BIND-Submitter/subtype and removed the OPTIONAL, 

- Added an OPTIONAL BIND-moI-sub-num object to SrND-mol-object 
to account for more complicated molecular complexes 

Added OPTIONAL BIND-condition-dependeney to BIND-Ioc-site 
Added boolean intramolecular field to BIND-descr 

Added 'noqe' to BIND-action-type to allow e.g. kinetics data to be stored with no action 

- Added enzyme activity amplification factor to listed kinetics values 
Added structured address fields to BIND-Contact-info object 
Added pathologicai-state to BIND-path-descr 

Added author list to interaction. Complex and Pathway record 
Added more options to BIND-gen-pJace enumerated type 
Added 'break' as choice in BIND-action-type object 

- Added list of synonyms for short-label in object 

Added list of complex exclusive interactions in molecular complex record 
Added in-situ to BIND-condition general field as a choice 

- Added optional BioSource to BIND-chemsource for natural products 

Added simple 'active* choice to BfND-state activity-level to allow description of active/inLctive state 
Removed sub fields from IID,MCiaPID cross reference records. This info cluners the cross-reference 

record 

and will make a cross-ref system less efficient and reduces privacy of submitter infonnation. Rather 

save 

submitter infomiation in another database that the specification does not impose structure on. 
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Added generic BIND-direction object to describe four cases a-a,a-b,b-a,b-b 

Changed a-on-b field in BIND-action and a-to-b field in BIND-signal to BIND-direction. In BIND- 

action 

this will account for autophosphorylation and similar self changing events 
Removed auto-amplify and auto-repress as choices in BIND-signal 
« Added optional list of accession numbers to BIND-submit 

— Added better description of submission tool to BIND-Submitter 

Added sub-unit fields to BIND-loc-site, BIND-action and BIND-state so a sub-unit in a complex 
can be specified. 
Added optional text description to BIND-required-state 
Added optional text description to BIND-loc-site 
Added Bioseq in experimental sequence field in BIND-condition 

~ ftp://bioinfo.mshri.on.ca/pub/BIND/Spec/bind.asn for the latest revision. 



- NOTE: This specification is in a variant of ASN.l 1 990 that may not 
be compatible with newer ASN. 1 tools. This specification also 
depends on v6.1 public domain specifications available from the 
U.S. National Center for Biotechnology Information (NCBI) 
ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools/ 
http://www.ncbi.nlm.nih.gov/Tooibox/ 



__^^#*##***^***»**»*****»****»*«**»»** ************** **»**♦♦»*****•*********** 

BIND DEFINITIONS 
BEGIN 

EXPORTS BJND-Submit, 

BIND-Interaction, BIND-Interaction-set, 
BIND-Pathway, BIND-Path way-set, 
BIND-Molecular-Complex, BIND-Compiex-set; 

IMPORTS Date, Int-ftizz FROM NCBl-General 
Author FROM NCBI-Biblio 
Bioseq FROM NCBI-Sequence 
Pub FROM NCBI-Pub 
BioSource FROM NCBI-BioSource 

Seq-loc, Seq-id FROM NCBI-Scqloc ^ 
Biosmic FROM MMDB 
Biostnic-feature-set FROM MMDB-Features; 



— <!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!> 
,_*****»**»*•♦•♦*•**♦♦*♦♦*♦***•****♦♦****•* 

— * Data Submission and Database exchange * 

— General data exchange 

— This object is used to submit all information to BIND. 
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~ Field description for BIND-Submit 



" date = date of creation of this set 

— database - description of database where this data originated 

— sub = person who is responsible for this data submission 

— sub-id = BIND submit ID for this submission 

— acc-nums = list of BIND accession numbers that this submission contains 

— interactions - a collection of interaction records 

— complexes = a collection of molecular complex records 

— pathways = a collection of pathway records 

m*********** ************************** *m**^********^*********^****<¥m**mmm*m 



BIND-Submit ::= SEQUENCE { 
date Date, 

database BIND-Database-site OPTIONAL, 

sub BIND-Submitter, 

sub-id BIND-Sufamit-id OPTIONAL, 

acc-nums SEQUENCE OF BIND-accession-number OPTIONAL, 

interactions BIND-Interaction-set OPTIONAL, 

complexes BIND-Complex-set OPTIONAL, 

pathways BIND-Pathway-set-©PTIONAL 

} 

BIND-Submit-id ::= INTEGER 

BIND-accession-number CHOICE { 
interaction Interaction-id, 
complex Molecular-Complex-id, 
pathway Pathway-id 
} 

1^*^**^^^^^***********0** 

— * Database description * 
************************ 

******************** ***********************m*m***************************** 

— Description of a database site 

— Field description for BlND-Database-Site 

************* *m ************************* 

~ descr = text description of this database 

— (e.g. C, elegans interaction database) 

— country = country where this database is based 

— homepage-url = Internet Universal Resource Locator for the database web site 

— fe.g, http://bioinfo,mshri.on.ca> 

— reference - a Meciine reference for this database 

^ ************* W9*W****** ************* ******l»0*^***^^^^^*^^,^^^^*^t ****** ****** 



BIND-Database-site ::= SEQUENCE { 
descr VisiblcString, 
country VisibleString, 
homepage-url VisibleString OPTIONAL, 
reference BIND-pub-set OPTIONAL 
} 



- * Submitter * 

**•♦♦*♦•«*•****•»*♦••♦••*•**♦*•*••*♦*♦♦*•♦*•*♦***************************** 

« Description of a submitter (Adaptation of NCBI Submit-Biock) 

-- Field description for BrND-Submitter 

contact = submitter contact information 
" hup = hold this submission until published 

- subtype = submission type 

tool = tool used to submit record (e.g. BIND Web Data Entry version 1.0) 



BIND-Submitter SEQUENCE { 
contact BIND-Contact-info, 
hup BOOLEAN DEFAULT FALSE, 
subtype ENUMERATED { 

not-specified (0), — 

new(l). -new data 

update (2), - update by author 

revision (3), - 3rd party (non-author) update 

other (255) }, 
tool BIND-Submission-tocl OPTIONAL 

} , 

BIND-Submission-tool SEQUENCE { 
name VisibleString, 
version VisibleString, 
descr VisibleString OPTIONAL 

} 

_ ♦*«**♦**♦***»*♦♦♦*♦***♦ 

— * Contact Information * 

*************************************************************************** 

— Contact information (Adaptation of NCBI Contact-info) 

~ Field description for BIND-Contact-info 

— first-name - First name of submitter 

— middle-initial = Middle initial of submitter 

— last-name = Last name of submitter 

— address = Street address of submitter 

— room ~ Room number 

— dept = Department 

— institute - Institute if this is different than organization 

(e.g. research instimte) 

— organization = Organization (e.g. University of Toronto) 

— city City 

— pcode = Zip or postal code 

— country = Country 

— phone = Phone number (with area code) 
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— fax = Fax number (with area code) 

— email - E-mail address 

— userid = User ID number 

— other = any other contact information 



BIND-Contact-info ::= SEQUENCE { 

first-name VisibleString OPTIONAL, 

middie-initial VisibleString OPTIONAL, 

last-name VisibleString OPTIONAL, 

address SEQUENCE OF VisibleString OPTIONAL, 

room VisibleString OPTIONAL, 

dept VisibleString OPTIONAL, 

institute VisibleString OPTIONAL, 

organization OPTIONAL, 

city VisibleString OPTIONAL, 

pcode VisibleString OPTIONAL, 

country VisibleString OPTIONAL, 

phone VisibleString OPTIONAL, 

fax VisibleString OPTIONAL, 

email VisibleString OPTIONAL, 

userid INTEGER OPTIONAL, 

other SEQUENCE OF VisibleString OPTIONAL 

} 

— * Publications * 

**************** 

— A set of publications 

— Field description for BIND-pub-set 

— disputed = TRUE if a BIND-pub-object in this set contains a dispute flag 
~ pubs = a sequence of BIND-pub-objects 



BIND-pub-set ::= SEQUENCE { 

disputed BOOLEAN DEFAULT FALSE, 
pubs SEQUENCE OF BIND-pub-objcct 
} 



~ A publication 

— Field description for BIND-pub-object 

— descr = text description of this object 

— opinion = does this publication support or dispute the data? 

— pub = full NCBl publication reference 
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BIND-pub-object SEQUENCE { 

descr VisibleString OPTIONAL, 
opinion ENUMERATED { 

none (0), 

support (1), 

dispute (2) 

pub Pub 

} 



********* 

— * Record Update * 

^i^m** ************ 

^^^^^^ mm*************** ************************************* *************** 

— An update for a record 

— Field description for BIND-update 

— date = date of this update ^ 

— descr = text description of update (this can store any update information 

— up to the entire previous version of the record in ASN.l ) 

^mmm** m *********** *********************************^***** ******* ********* 



BIND-update-object ::= SEQUENCE { 
date Date, 
descr VisibleString 

} 

m mm m** *********************************************************** ******* 

— Cell cycle stage 

— Field description for BIND-cellstage 

^m***m* *************** *************** 

— phase = phase of cell cycle 

— descr = text description of cell stage (e.g. if 'other* is specified) 

^ m******** ********************************************************** ******** 

BIND-cellstage ::= SEQUENCE { 
phase INTEGER { 

not-specified (0), 
constitutive (1), 
interphase (2), 
division (3), 

gl (4). 

s(5), 

g2(6), 

mitosis (7), 

prophase (8), 

prometaphase (9), 

mctaphasc(lO), 
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anaphase 
telophase (12). 
cytokinesis (13), 
meiosis(I4), 
prophase! (15), 
leptotene (16), 
zygotene (17), 
pachytene (18). 
diplotene(19), 
diakinesis (20), 
metaphasel(21), 
anaphase 1 (22), 
telophase 1 (23), 
meiotic-cytokinesis (24), 
prophase2 (25), 
metaphase2 (26), 
anaphase2 (27), 
tcIophase2 (28), 
meiotic-cytokinesis2 (29), 
other (255) 
}, 

descr VisibleString OPTIONAL 
} 



— A Real Number 

— Field description for RealVal-Units 

******** 

— scaled-integer-vaiue * lO'^(scale-factor) 

— units = string value of the units involved (e.g. ml, M, etc.) 

•.******************************************^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 



RealVaNUnlts ::= SEQUENCE { 

scale-factor INTEGER, 
scaied-integer-value INTEGER, 
units VisibleStringOPTIONAL 
} 



<!><!><!><!><!><!><f><!><!><><|><j><!><,><l><l><j><j><,><,^^ 
*************** 

* Interaction * 
*************** 



- A set of interactions 

- Field description for BlND-Intcraction-set 

^ *************************0^^t^^^^^^^^^^^^^ 

- date = date this set of records was collecied 

- database name and description of database that this set originates 

- interactions - set of interaction records 
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BIND-Intcraction-set SEQUENCE { 
date Date OPTIONAL, 
database BIND-Database-site OPTIONAL, 
interactions SEQUENCE OF BIND-Interaction 
} 

_.***«♦****♦♦****•**•♦***•*•*«******•**•*»«♦•*»***♦***♦♦♦♦»♦**♦♦»•*♦♦****♦*»♦ 

— A BIND-Interaction record can store all of the details of an interaction 

— between any two molecules (or atoms). 

— Field description for BTND-lnteraction 

— date = date of record entry 

updates = a list of updates for this record 

— iid = interaction accession number 

— a = molecule 'a* interacts with... 

— b = molecule 'b' 

descr = description of interaction 

— source = empirical evidence references (publications) 

— authors = person(s) who authored this record. 
~ priv = TRUE if this interaction is private 

" NOTE: In the context of this data specification, the 'priv' flag means: 
-Do not export this record. 

-In a public database, this record is not available to be publicly 

— retrieved. 

•In a private database, this record can be retrieved, but it will 

— not be exported. 

rn*****^*****-^-* ******************* 



BIND-Interaction ::= SEQUENCE { 
date Date, 

updates SEQUENCE OF BIND-update-object OPTIONAL, 
iid Interaction-id, 
a BIND-object, 
b BIND-object, 
descr BIND-descr, 

source BIND-pub-sct, ^ 
authors SEQUENCE OF Author OPTIONAL, 
priv BOOLEAN DEFAULT FALSE 
} 

Interaction-id INTEGER 

*************m********* 

" * Biomolecular Object * 
*********************** 

^ *************************** ****m************ ****** **m*********** *********** 

— Any chemical object 

— Field description for BIND-object 
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— short-label = short label of this object (e.g. ATP, S4, HSP70) 

— short-labcl-syn = list of short-label synonyms for this object 

~ id = the t>'pe of chemical object and a pointer to a record in a database 

— of the object type (e.g. protein database) 

Choice of: not-specified, protein, DNA, RNA, small-molecule (any other type of 
chemical compound), interaction, molecular complex 

— origin = material source {biological or chemical origin) 

— cell-stage = description of cell cycle stages this object is specific to 
~ seq = space for sequence, if it is not in a public database 

ALSO, this can be a consensus sequence for binding of this object 
(e.g. transcription factor binding to DNA) 
~ struc - space for complete structure, if not in public database 

(This should not be used to store a structure that is already in 
the MMDB) 

— descr = text description of this object 

— ******************** ••*****•********•***••***•******»♦********♦ 



BIND-object SEQUENCE { 

short-label VisibleString, 

short-label-syn SEQUENCEJDF VisibleString OPTIONAL, 
id BIND-object-type-id, 
origin BIND-object-origin, 

cell-stage SEQUENCE OF BIND-ceilstage OPTIONAL, 

seq Bioseq OPTIONAL, 

struc BiostrtJC OPTIONAL, 

descr VisibleString OPTIONAL 

} 

BIND-object-type-id ::= CHOICE { 
not-specified NULL, 
protein BlND-id, 
dna BIND-id, 
ma BIND-id, 

small-molecule BIND-small-moiecuIe-id, 

complex Molecular-Complex-id 

} 



BIND-object-origin CHOICE { 
not-spccified NULL, 
org BioSource, 
chem BIND-chemsource 
} 



- ********•*********♦******•**♦*••••******♦**•♦*•*****♦•»*•»**♦♦*•«♦♦♦♦♦,»»«♦ 

— Summary description of a chemical compound 

- Field description for BIND-chemsource 

— names — chemical compound name and any synonyms 

— smiles-string = standard smiles-string for this compound 
~ References for SMILES language: 

- D. Weininger, SMILES, a Chemical Language and Information System. 
I . Introduction to Methodology and Encoding Rules, 

- J. Chem, Inf. Comput. Sci. 1988, 28, 31-36. 

— Web sites: 
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http:/yv^wv.daylightxonv'dayhtml/smiies/smiles-intro.htmI 

— http://www2xcc.uni-crlangen.de/scrvices/smilcs.html 

— molecuiar-weight = molecular weight of this compound in g/mol 

— chemicaUformuIa - chemical formula of the compound (e.g.C3H7N02) 

— cas-number = Chemical Abstracts Service {http://www.cas.org/) 

— database number for this compound (e.g. 56-4 1-7) 

— nat-prod - biological source information if this is a natural product 

***************************************** mm* 



BIND-chemsource SEQUENCE { 
names SET OF VisibleString, 
smiles-string VisibleString OPTIONAL, 
chemical-formula VisibleString OPTIONAL^ 
molecular-weight RealVal-Units OPTIONAL, 
cas-number VisibleString OPTIONAL, 
nat-prod BioSourcc OPTIONAL 
} 



* Identifiers * " 

^ m*mmm**m*mm****m***m ******************** m*mmmm****mm*mm**m*mm*m**m*'****mmi^* 

— General sequence or domain identifier 

— Field description for BIND-id 

^^^0,^m*^^^^t0*it**m************ 

— gi = NCBI integer accession number (optional only for sequence data with 

— no NCBI database identifier), 

— NOTE: gi is stored so that a BIND-object refers to a constant sequence 

— moiecuic. This is necessary to maintain data integrity of Seq^loc's 

— also stored in the BIND database. 

— di = domain accession number (from the domain split database) 

— other === open field for other possible NCBI defined pointers 

— (if possible, equivalent GenBank accession number to this 
gi should be stored here as weii) 

— NOTE: there is a field for gi in a Seq-id, but it should not be used 

— in this object ^ 

„ **********************************m**m*m^mmmm****m*********m^**^*^^**itm**** 



BIND-id ::= SEQUENCE { 

gi Gcninfo-id OPTIONAL, 
di Domain-id OPTIONAL, 
other SET OF Seq-id OPTIONAL 
} 

Geninfo-id ::= INTEGER 
Domain-id ::= INTEGER 



~ Pointer to a small molecule database 
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" Field description for BlND-small-moIecu!e-id 

- internal = id number of an internally kept record of a chemical compound 

" other-db = generic pointer to any other database (e.g. Japanese LIGAND db) 

- Contains the name of the database, an integer pointer and/or a string 
pointer. 



BrND-small-moiecule-id CHOICE { 

internal Intemal-small-molecuie-id, 
other-db BIND-other-db 

/ 

Intcmal-small-moiecule-id INTEGER 

SrND-other-db ::= SEQUENCE { 
dbname VisibleString, 
imp INTEGER OPTIONAL, 
strp VisibleString OPTIONAL 

} - 



" * Interaction Description (in BIND-Interaction) * 

— *************♦************•*»***«*♦****♦», 

" ************************************************************** 

— Full description of an interaction 

— Field description for BfND-descr 

********************* mm*m***m**m 

- simple-descr = text description of this interaction 

- place = description of cellular place of interaction 

— cond = binding conditions/experimental conditions 

— cons = conserved sequence comment 

- binding-sites = location of binding sites on object A and B 

- action = list of chemical actions that can occur in this interaction 

- state = list of chemical states of *a' and 'b' as well as required 

state for interaction to occur 
-- intramolecular = only relevant if 'a' and *b' refer to the same molecule. 
TRUE if the interaction is intramolecular 



BIND-descr :~ SEQUENCE { 

simple-descr VisibleString OPTIONAL, 

place SEQUENCE OF BiND-placc OPTIONAL, 

cond BIND-condition-sct OPTIONAL, 

cons BIND-cons-seq-set OPTIONAL, 

binding-sites BIND-loc OPTIONAL, 

action BIND-action-set OPTIONAL. 

state BIND-state-descr OPTIONA-. 

inn-amolecuiar BOOLEAN DEFAULT FALSE 

} 
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- • CeJIular Interaction Place (in BIND-dcscr) * 

■ -* Location of interaction with respect to the cell 
" Field description for BIND-place 

**.**m**** ********** *^ *********** 

- bpid = interna] BIND place ID number 

gen-place = general cellular locations where this interaction takes place 
(computer readable) 

- spec-place = specific text locations of the interaction 

- (human readable) 

" source = empirical evidence references (publications) 

" dcscr = text description (e.g. method of finding interaction place) 



BIND-place ::= SEQUENCE { 

bpid BIND-place-id OPTJONAU 
gen*place BIND-g en-place-set, * 
spec-place BIND-spec-place-set OPTIONAL, 
source BIND-pub-set OPTIONAL, 
descr VisibieString OPTIONAL 
} 

BIND-place-id :;== INTEGER 

„ **•♦•**»*♦*•****•* *************************»*»*:****^^*^^^^^^^^^^^^^,^^^^^^^ 

~ General start and end places for an interaction 

— Field description for BIND-gen-place-set 

„ ****************** ************^^^l^^^^^^^^ 

— Start - general place in the cell where this interaction takes place 

— end = general place in the ceil where this interaction ends 

— (e.g. for translocation) 

«• descr = text description (e.g. mechanism of translocation) 



BIND-gen-piace-set SEQUENCE { 
start BIND-gcn-place, 
end BIND-gcn-piace OPTIONAL, 
descr VisibieString OPTIONAL 
} 

— General cellular place where this interaction takes place 

— This object is meant to be computer readable for e.g. a pathway 

" drawing program. Further cell locations are not enumerated because 
~ there are too many in biology. 

— Field description for BIND-gen-place 
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— An enumeration of general cell places 

not-specified = not-specified 
extracellular = extracellular 
cytoplasm = in cytoplasm 
organelle = in an organelle 

— nucleus = in nucleus 

— membr-cell-cyt = on the cytoplasmic side of the cell membrane 
membr-cell-in = in the cell membrane 

mcmbr-cell-ext = on the surface of the cell membrane 
membr-outer-peri = on the periplasmic side of the outer membrane 
membr-outer-in = in the outer membrane 
membr-ouier-ext = on the surface of the outer membrane 

— ceilwall-cell = on the inside surface of cell wall 
celiwall-in ~ in the cell wall 

cellwall-ext = on the outside surface of the cell wall 

other = other location - sec text description in BIND-gen-loc-set 

BIND-gen-place :~ ENUMERATED { 
not-specified (0), 
extracellular 
cytoplasm (2), 
organelle (3), 
nucleus (4), 
membr-cell-cyt {S\ 
membr-celi-in (6), 
membr-ceil-ext (7), 
membr-outer-peri (8), 
membr-outer-in (9), 
membr-outer-ext ( 1 0), 
cellwail-ceildl), 
cellwall-in(12), 
cellwail-ext(13), 
nuciear-cnvelope (14), 
perinuclear-space (15), 
nuc-inner-membr (16), 
nuc-outer-mcmbr ( 1 7), 
nucleolus ( 1 8), 
chromatin ( 1 9), 
er(20), 

smooth-er (21), 
rough-cr (22), 
golgi (23), 
cis-golgi (24), 
trans-goigi(25), 
vacuole (26), 
Jysosome (27), 
peroxisome (28), 
mitochondrion (29), 
mito-outer-membr (30). 
mito-inncr-mcmbr (3 1 ), 
mito-chrisuc G2), 
mito-matrix (33), 
chloroplast(34), 
chior-inner-membr (35), 
chlor-outer-membr (36), 
thyIakoid(37), 
grana(38). 



stroma (39)» 
ccntrosomc (40), 
ccntriole (41), 
other(255) 
} 

BIND-gen-pIace-expanded ::= CHOICE { 
not-specificd NULL, 
extracellular NULL, 
cytoplasm NULL, 

cell-wall BIND-gen-piace-membr-descr, 
outer-membrane BIND-gen-place-membr-descr, 
cytoplasm ic-membrane BIND-gen-place-membr-descr, 
organelle-unknowi) BIND-gen-place-membr-descr, 
organeile-other BfND-gen-placc-membr-descr, 
nucleus BIND-gen-place-membr-descr, 
er-general BIND-gen-place-mcmbr-descr, 
er-smooth BIND-gen-piace-membr-descr, 
er-rough BIND-gen-place-mcmbr-descr, 
goigi BFND-gen-pIace-membr-descr, 
cis-goigi BrND-gen-place-membr:<iescr, 
trans-golg! BIND-gen-place-membr-descr, 
vacuo le B IND-gen-pIace-membr-descr, 
iysosome BrND-gen-place-membr-descr, 
peroxisome BIND^gen-place-membr-descr, 
endosome BIND-gen-pIace-membr*descr, 
mito-general NULL, 

mito^outer-membrane BIND-gen-place-membr-descr, 
mito*inner-membrane BIND-gen-place-membr-descr 
} 

BIND-gen-place-membr-descr ::= ENUMERATED { 
not-specified (0), 
outer-surface ( 1 ), 
within (2), 
inner-surface (3), 
lumen (4) 
} 

— Specific start and end places for an interaction 

— (Human readable) 

— Field description for BIND-spec-place 

~ start = specific location where this interaction takes place 

« (e.g. trans golgi, basal membrane, inner mitochondrial space, etc.) 

— end — specific location where this interaction ends 



BIND-spcc-pIace-sct SEQUENCE { 
start VisibleString, 
end VisibleString OPTIONAL 
} 
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" ♦ Interaction conditions (in BIND-descr) ♦ 



' A set of experimentai conditions. 

■ Field description for BIND-conditions-set 

^ *******itm* ***************** ************** 

■ max-icid = the highest icid used in this set 

■ conditions = set of BIND-condition objects 

************************************ *********m****************^^*^^^^^^^^^^^ 



BIND-condition-set ::= SEQUENCE { 

max-icid Intemal-conditions-id, 

conditions SEQUENCE OF BIND-condition 

} 



„ ****************************************** 

— An experimental condition that has been used to observe 

— this interaction. Interaction conclusiorwnust be reproducible 

— using this information. 

— Field description for BIND-conditions 

******* ************* *m* *********** **^ 

— icid = internal condition id 

-- general = list of possible general experimental conditions 

— system = experimentai system used 

~ exp-seq experimental sequence used if different from actual sequence 
(e.g. HIS tagged sequence) 

— descr - text description (e.g. if 'other' is specified 

in conditions or system) 

— source = empirical evidence 
— ********************** 



BIND-condition :;= SEQUENCE { 
icid Intemal-conditions-id, 
general ENUMERATED { 
in-vitro(O), 
in-vivo(l), 
in-situ(2), 
in-siHco(3), 
other(255) 
}. 

system BIND-experimental-system, 
exp-seq Bioseq OPTIONAL, 
descr VisibleStrmg OPTIONAL, 
source BIND-pub-set OPTIONAL 
} 

Intcmal-conditions-id ::= INTEGER 



BIND-experimental-system ::= INTEGER { 
noi-specified (0), 
alanine-scanning (I), 



affinity-chromatography (2), 
atomic-force-microscopy (3), 
autoradiography (4), 
competition-binding (5), 
cross-linking (6), 

deulerium-hydrogcn-cxchange (7), 
eiectron-microscopy (8), 
electron-spin-resonance (9), 
eiisa(lO), 

equiitbrium-diaiysis (II), 
fluorescence-anisotropy (12), 

footprinting(13X 
gel-retardation-assays (14), 
gel-filtration-chromatography (15), 
hybridization ( 1 6), 
immunoblotting (17), 
immunoprecipiiaiion (18), 
immunostaining ( 1 9), 
interaction-adhesion-assay (20), 
light-scattering (21), 
mass-spectrometry (22), _ 
membrane-filtration (23), * 
monoclonal-antibody-blockade (24), 
nuclear-transiocation-assay (25), 
phage-display (26), 
reconstitution (27), 
resonance-energy-transfer (28), 
sile-directed-mutagenesis (29), 
sucrose-gradient-sedin[icntation (30), 
surface-plasmon-resonance-chip (3 1 ), 
transient-coexpression (32), 
three-dimensional-structure (33), 
two-hybrid-tcst (34), 
other (255) 
} 



— * Interaction conserved sequence comment (in BlND-descr) 

— Conserved sequence comment set 

— Only relevant for biological sequences. 

— (e,g. Derived from multiple alignment information) 

— Field description for BIND-cons-seq-set 

„ *m*9m***mm****#****i^m*m********^^ ****** 

— a = conserved sequence comment for molecule *a' 
b = conserved sequence comment for molecule 'b' 

****************** m**m**»* ********* ****^i^^^^^^^^ 

BIND-cons-seq-set ::= SEQUENCE { 

a BIND-conserved-seq OPTIONAL, 
b BIND-conserved-seq OPTIONAL 
} 
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— Conserved sequence comment 

— Alignment data is not stored here, only the conclusion from it. 

-- Field description for BIND-conserved-seq 

♦«•♦»»•*♦♦*•*•*♦♦•♦*♦••**•♦*••*•***♦**** 

seq-el = sequence elements that have been shown to be conserved 

— descr = text description (e.g. method of determining conserved sequence) 

— source = empirical evidence 

:^^^^^m9*0 ***************************** ****************************m*i^m*^^m* 



BIND-conserved-seq ::= SEQUENCE { 
seq-el Seq-ioc, 

descr VisibleString OPTIONAU 
source BIND-pub-set OPTIONAL 

} 

^^^^itm*****^** ************************* **************************** 

* Binding location on molecules in an-interaction (in BIND-descr) * 

^tm***********m***-0************ ******************* ****************** 
*************************************************************************** 

— Binding location on a BIND-object 

— Field description for BIND-loc 
****************************** 

— detailed = atomic level detail of interaction sites 

~ general = sequence element level description of interaction sites 

— source = empirical evidence 

*************************************************************************** 



BIND-loc ::= SEQUENCE { 

detailed Biostruc OPTIONAL, 
general BIND-loc-gen OPTIONAL, 
source BIND-pub-set OPTIONAL 

} 

*************************************************************************** 

— General binding location on a BIND-object 

— Field description for BIND-loc-gen 
********************************** 

— a-sttes = list of binding sites on object A 

— b-sites = list of binding sites on object B 

— bound = list of sequence elements from A and B that are bound together 
*************************************************************************** 



BIND-loc-gen SEQUENCE { 
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a-S!tes BIND-Ioc-site-set OPTIONAL, 
b-sites BIND-loc-site-sct OPTIONAL, 
bound SEQUENCE OF BIND-loc-pair OPTIONAL 

} 

A graph describing which sites on A bind to which sites on B 
BIND-loc-siie objects are nodes in the graph 
BIND-loc-pair objects are edges in (he graph 

— Field description for BIND- loc-site 

— slid = internal ID of this sequence element 
" site = a sequence element (point or interval) 

condition = this binding site seen only under certain experimental conditions 
" sub-unit = if a or b is a molecular complex, specifies which sub-unit the site is on. 
~ descr = description of this binding site 

— Field description for BIND-loc-pair 

— a-s!id = the Seq-ioc pointed to by this ID is connected to.., 

— b-siid = the Seq^loc pointed to by this ID 



BIND-loc-site-set SEQUENCE { 
max-slid BIND-Seq-ioc-id, 
sites SEQUENCE OF BlND-loc-site 
} 

BIND-loc-site SEQUENCE { 
slid BIND-Seq-Ioc-id, 
site Seq-Ioc, 

condition BIND-condition-dependency OPTIONAL, 
sub-unit BIND-complex-subunit OPTIONAL, 
descr VisibleString OPTIONAL 
} 

BlND-ioc-pair ::= SEQUENCE { 

a-slid BIND-Seq-loc-id, >^ 

b-slid BIND-Seq-loc-id 

} 

BIND-Seq-loc-id ::= INTEGER 

— * Interaction chemical action (in BIND-dcscr) * 
„**«*•••»**•♦♦*♦*•******♦♦**•*•****♦•*♦•***♦**** 

~ A set of chemical actions 

— Chemical actions mediated by a molecule (object 'a' or V) in the 

— interaction (a set because a kinase may phosphorylaie a protein multiple 
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times) 

Field description for BIND-action-set 
__**••♦»»•*»♦**♦♦♦♦•*•♦*•♦♦*»»•**•••*♦* 

— max-iaid = the highest iaid used in this set 

— actions = set of BIND-action objects 

BIND-action-set ::= SEQUENCE { 
max-iaid Intemal-action-id, 
actions SEQUENCE OF BrND-action 

} 



— A chemical action 

*- Field description for BIND-action 

— iaid = internal action id (unique identifier for this action in a set) 

— descr = text description (e.g. if 'other' Is specified for type) 
-- direction = direction of chemical action 

— type = type of chemical action 

— resuh = the product(s) of this chemical action 

— NOTE this field holds the exact chemical form that is produced, and is 

— used by reference by the next interaction acting on the "product". 

— For a biopolymer this holds the atoms&bonds representation of the 

— molecule. 

— diff = the atomic level detail of differences created by this action 
" signal = more general kinetics, signal transduction 

" kinetics = chemical action kinetics 

— conditions = link to experimental conditions used to find this action, 

e.g. if there were multiple experimental conditions stored in 
this interaction record and this action was only seen using 
some of them. 
~ source = empirical evidence 

— sub-unit-a = if a is a molecular complex, specifies the sub-unit to which 

— this chemical action applies 

— sub-unit-b = if b is a molecular complex, specifies the sub-unit to which 

this chemical action applies ^ 

— active-site = which site on the acting molecule is performing the chemical action 

****m* ****************************************** ^***********:tt^m^^**^*m***** 



BIND-action ::= SEQUENCE { 
iaid Intemal-action-id, 
descr VisibleString OPTIONAL, 
direction BIND-dircction, 
type BIND-action-type, 

result SEQUENCE OF BIND-object OPTIONAL, 
difFBiostruc-fcaturc-sct OPTIONAL, 
signal BIND-signal OPTIONAL, 
kinetics BIND-kinetics OPTIONAL. 

condition- SEQUENCE OF BIND-condition-dependency OPTIONAL, 
source Bi.Su-pub-set OPTIONAL, 



-59- 



sub-unit-a BIND-complex-subunil OPTIONAL, 
sub-unit-b BIND-complcx-subunit OPTIONAL, 
activc-sitc BIND-active-siie OPTIONAL 
} 

Intemal-action-id ::= INTEGER 

BTND-dircction ::= ENUMERATED { 
none (0), 
a-to-a(l), 
a-to-b (2), 
b-to-b (3), 
b-to-a (4), 
other {255X 
} 

BIND-active-site CHOICE { 
slid BlND-Seq-loc-id, 
site BIND-ioc-site-set 
} 

^_ ***itc0 *********** ******* **m***m***m*ik*m*iti 

" The type of action and object of that action 

— Action type object of that action 
add BIND-objecl or NULL 

— remove BIND-object or NULL 

— cut-seq Seq-loc or NULL 

— Field description for BIND-action-type 

„ ************************************** 

— -not-specified = action is not-specified (unknown) 

— none = no chemical action, but e.g. kinetics information needs to be stored 

— (action is known to be nothing) 

~ -add = add an object (e.g. phosphate) to an object 

— -remove = remove an object (e.g. phosphate) from an object 

— -break = non-sequence cut action - e.g. small molecule hydrolysis 
— cut-seq — cut a sequence, location may be specified 

(e.g. restriction enzyme) 

— -change-conformation = a change in conformation of a molecule 

— (e.g. hck protein -> phosphorylation causes conformational change) 

— -change-configuration = a change in configuration of a molecule 

(e.g. by an epimerase or isom erase) 

— -change-other = another type of change (e.g. metal ion exchange) 

— -other - another action 

— Field description for BIND-action-object 

^ **************************************** 

— none = no action object 

— object - any BIND-object that is added or removed (e.g. phosphate) 

— location = location where a sequence was cut 

_^ ************************************************** *****^**** 



BIND-action-type CHOICE { 
not-specified NULL, 
none NULL, 
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add BlND-action-objeci, 
remove BIND-aclion-object, 
break NULL, 

cut-seq BIND-action-objecl, 
change-conformation NULL, 
change-configuration NULL, 
change-other NULL, 
other NULL 
} 

BIND-action-object ::= CHOICE { 
none NULL, 
object BIND-object, 
location Seq-loc 
} 



^rn^^m** **************** ***************************** 

— A chemical stgnaJ description 

A more general notion of kinetics describing signal transduction. 

- Field description for BIND-signal 

:^itim* ***************************** 

- action == signal modification 
direction = direction of signal 

— factor = the factor of the amplification or the repression 
~ descr - text description (e.g. if ^other* is specified) 

^^m***m********* ************* *********************** 



BIND-signal SEQUENCE { 

action ENUMERATED { 
none (0), 
amplify (1), 
repress (2), 
other (255) 
}. 

direction BIND-direction, 
factor RealVal-Unils OPTIONAL, 
descr VisibleString OPTIONAL 
} 

m ********************************** ***************** 

— Chemical kinetics and thermodynamics data 

~ Field description for BIKD-kinetics 

^ *********************************** 

— descr = optional text description of this object 
~ kd = dissociation constant of interaction 

~ km = Michaelis-Menten constant 

— vmax = max, velocity of reaction 
" rxn-order = reaction order 

— conc-a ~ concentration of 'a* 
~ conc-b = concentration of *b' 

— conc-a-bound = concentration of 'a* that is bound 

— conc-b-bound = concentration of 'b' that is bound 
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" conc-a-unbound = concentration of 'a' that is not bound 

— conc-b-unbound = concentration of 'b* that is not bound 

— enz-activity-amp-factor = scalar amplification factor for enzyme kinetic activity 

— temp = temperature of the interaction system (observed) 

— ph = pH of the interaction system 
half-life-a= 1/2 life for 'a' 

— half-Iife-b = 172 life for V 
buffer = buffer text description 

— delta-g ~ delta G (delta Gibbs free energy) 

— dclta-s = delta S (delta entropy) 
" deUa-h delta H (delta enthalpy) 

— hcai-capacity-a = heat capacity of 'a' 
" heat-capacity-b = heat capacity of 'b* 

other = any other related values (e.g. kl, k2...) 

— source = empirical evidence 



BIND-kinetics SEQUENCE { 

descr VisibleString OPTIONAL, 

kd ReaiVai-Units OPTIONAL, _ 

km RealVal-Units OPTIONAL, * 

vmax RealVai-Units OPTIONAL, 

rxn-order RcalVal-Units OPTIONAL, 

conc-a RealVai-Units OPTIONAL, 

conc-b RealVal-Units OPTIONAL, 

conc-a-bound RealVal-Units OPTIONAL, 

conc-b-bound RealVal-Units OPTIONAL, 

conc-a-unbound RealVal-Units OPTIONAL, 

conc-b-unbound RealVal-Units OPTIONAL, 

enz-activity*amp-factor RealVal-Units OPTIONAL, 

temp ReaiVal-Units OPTIONAL, 

ph RealVal-Units OPTIONAL, 

half-life-a RealVal-Units OPTIONAL, 

half-iife-b ReaiVal-Units OPTIONAL, 

buffer VisibleString OPTIONAL, 

delta-g RcalVai-Units OPTIONAL, 

delta-s RealVal-Uniis OPTIONAL, 

delta-h ReaiVal-Units OPTIONAL, 

heat-capacity-a RealVal-Units OPTIONAL, 

heat-capacity-b RealVal-Units OPTIONAL. ^ 

other SEQUENCE OF BIND-kinetics-other OPTIONAL, 

source BIND-pub-set OPTIONAL 

} 

BIND-kinetics-other ::= SEQUENCE ( 
descr VisibleString, 
value RealVal-Units 
} 

— Dependency of interaction on an experimental condition 

~ The experimental condition(s) used to observe this chemical action. 

— Pointer to a BIND-condition object. Uniquely locates an experimental 

— condition by Interaction- id then by intemal-condiiion-id. 



) 
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- Field description for BIND-condilion-dependency 

- from-iid = interaction that contains the experimental condition 
cond = internal condition ID number of the condition description 



BIND-condition-dependency ::= SEQUENCE { 
from-iid interaction-id, 
cond Intemal-conditions-id 

} 

*«****»****•♦*♦***********************'*****•********************** 

— ♦ Interaction - chemical state for 'a' and/or 'b* (in BIND-descr) * 
******************************************************************* 

»**«*«»**•♦♦••*•****•**************♦******♦*♦**♦♦♦************************* 

— Chemical state and required chemical state for objects V and *b' 

-- The chemical state in the srND-state-descr is "the chemistr>'" of 'a' or *b* 

— in this particular molecular interaction. The chemistry is referred to by 

— reference, typically to another interaction record's 

— interaction:action:resuit which encodes a BlND-object that is the 

— "bio-processed" form of *a* or 'b' used in this interaction. 

— Field description for BIND-state-descr 

— a = list of possible chemical states for 'a' that can undergo this 

interaction 

~ a-required-state = the state that 'a* in the above list of possible states 
is required to assume before interaction takes place. 

— b = list of possible chemical states for 'b' that can undergo this 

interaction 

— _b-rcquired-state = the state that 'b' in the above list of possible states 

— is required to assume before interaction takes place. 

— NOTE: multiple required states are only used if a or b is a molecular complex 

and the state of more than one sub-unit needs to be described. 

****»♦♦****♦•♦«»*♦******************•*♦•****•******♦**♦♦♦*******♦*♦******** 

BIND-state-descr SEQUENCE { 

a BIND-state-set OPTIONAL, 

a-required-state SEQUENCE OF BlND-required-state OPTIONAL, 
b BIND-state-set OPTIONAL, 

b-required-state SEQUENCE OF BIND-required-state OPTIONAL 

} 

^^^^^^»*»^^**»***********#************»******»*** **************************** 

— A set of chemical states 

— e.g. multiple phosphorylations on a protein; ail of which may be active 

in this interaction record. 
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- Field description for BIND-state-set 



" max-isid - highest Intcmal-state-id used in this set 
- states = ]ist of possible chemical states 



BIND-state-set ::= SEQUENCE { 
max-isid Intemal-state-id, 
states SEQUENCE OF BIND-state 

} 

Intemal-state-id INTEGER 



— Interaction chemical state (in BIND-descr) 

— A chemical state 

" Points to the chemistry of a molecule, if known, by reference to an 

— interactioniaction with an explicit 'result' field. 

— This allows conversion of a sequence to chemistry with modifications - 

— can describe a protein that has been phosphorylated at a certain residue, 

« Here wc can exactly state the chemistry of a molecule as it is found in 

— the ceil, even though the top BIND-object may only refer to the Gl. 

— Field description for BIND-state 

— isid = Intern al-state-td (unique for each state in a BIND-state-set) - 

— activity-level = general activity of molecule 

" cause = sequence of actions from this or other Interactions that bring 
about this state 

" descr = text description (e.g. method used to determine this state) 

— source = empirical evidence for this state 

sub-unit = if a or b is a molecular complex, specifics the sub-unit to wjiich 
this state applies 

— **♦******»********♦•*****♦**♦****♦»**♦*♦»*•*•*♦*»*»♦*»»»♦*♦*♦♦*♦♦*♦♦♦♦»♦*♦* 



BIND-state ::= SEQUENCE { 

isid Intemal-state-id, 

activity-level ENUMERATED { 
not-specified (0), 
inactive (I), 
very-low (2), 
low (3), 
medium (4), 
medium-high (5), 
high (6), 
very-high (7), 
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extreme (8), 
active (9), 
other (255) 

cause SEQUENCE OF BIND-staie-cause OPTIONAL, 

descr VisibleSlring OPTIONAL, 

source BIND-pub-set OPTIONAL, 

sub-unit BrND-comp!eX"Subunit OPTIONAL 

} 

♦*«*•*•♦•*»♦♦*******♦*•*♦***♦***********•*********♦*♦***' 

" Cause of a chemical state 

- The chemical action from this or other interactions that directly brings 

- about this state. 

- References an external interactionraction uniquely, 

- The "cause" is really the Interaction:action pair elsewhere 

- in the database that is the most recent step in the biochemical 

- conversion that forms the biochemical entity in 'a* or V. 

- Action and state are peer BIND-descr tags, this allows 

- a reference to causal 'action' within the chemical state. 

- Field description for BFND-state-cause 

i^^mm***** ********************* ******** 

- from-iid = interaction that contains the causal chemical action 

- cause = internal action ID number that caused this activity 

^ ^^^^^9^mm********* ************************************** 



BIND-state-cause ::= SEQUENCE { 
from-iid Interaction-id, 
cause Intemal-aciion-id 
} 

***************** ****'**************** ************** 

— A required chemical state for interaction to take place 

— The state in the state set that is required for the interaction to take 

— place. Uniquely locates a chemical state within this interaction record 

— by Intemal-state-id. 

Field description for BIND-required-state 
_ mm*************************************** 

— isid = Intemal-state-id of the required state. Points to 

— one chemical state in the BIND-state-set in the same record 
descr = description of state requirement 

— source = empirical evidence 

^ m ************************************** ***************** 



BIND-required-state SEQUENCE { 
isid Intcmal-siaie-id, 
descr VisibleString OPTIONAL, 
source BIND-pub-set OPTIONAL 

} 
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- <!><?><!><f><;><l><!><!><!><!><!><!><!><l><!><!><!><!><!><!><!><f><t><!><!> 



- ♦ Molecuiar-Complex * 

^ *m»>***«f^*^* ******************************* ******* ******** 

- A set of Molecular Complexes 

- Field description for BIND-Compiex-set 

Ttm*******^******** ******************** 

- date = date this set of records was collected 

- database = name and description of database that this set comes from 

- complexes - set of molecular complex records 

„************************************************************** ******^*^** 



BIND-Compiex-set ::= SEQUENCE { 
date Date OPTIONAL, 
database BIND-Database-site OPTIONAL, 
complexes SEQUENCE OF BIND-Molecular-Complex 
} 



^ ***************************************** m***m**********^**m****** ******* 

— A molecular complex record 

— A collection of more than two interactions that form a complex. 

— i.e. Three or more BIND-objects that operate as a unit. It is a 

— useful shorthand when defining BIND pathways. 

— A molecular complex can also be defined if the interactions in it are not 

— completely known. Create interactions with molecule 'a' as the sub-unit of 

— the complex and molecule 'b' as 'not-specified* for all of the known 

— sub-units. 

— Field description for BlND-Molecular-Complex 
„************************************** ***^^i^ 

— date = date of record entry 

— updates = a list of updates for the record 

— mcid = molecular complex accession number. 

— dcscr = text description of complex (e.g. ribosome) ^ 

— sub-num - total number of sub-units in this complex 

— sub-units = collection of BlND-objects in the complex 

— interaction-list = list of interactions in this complex 

— ordered — TRUE if order of interactions is known and 

— interaction-list is ordered in this way 

— complex-topology = a connectivity graph of the complex topology 

— with BIND-objects as nodes 

— excJ us ive-int = interactions that exclusively occur between subunits only 

— when part of this molecular complex 

— source = empirical evidence references 

— authors - person(s) who authored this record. 

— priv ~ TRUE if this complex is private 
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BIND-Molecuiar-Complcx SEQUENCE { 
date Date, 

updates SEQUENCE OF BIND-update-object OPTIONAL, 

moid Mo!ecular-Complex-id, 

descr VisibleString OPTIONAL, 

sub-num BIND-mol-sub-num, 

sub-units SEQUENCE OF BIND-mol-object, 

interaction-list SEQUENCE OF Inleraction-id, 

ordered BOOLEAN DEFAULT FALSE, 

complex-topoiog>' SEQUENCE OF BIND-mol-objecl-pair OPTIONAL, 
exciusive-int SEQUENCE OF BIND-Interaction OPTIONAL, 
source BIND-pub-set, 

authors SEQUENCE OF Author OPTIONAL, 

priv BOOLEAN DEFAULT FALSE 

} 

MoIecular-CompIex-id INTEGER 

— specify a specific sub-unit of a complex 
BIND-compiex-subunit :~ SEQUENCE { 

mcid Molecular-CompIex-id, 

bmoid BIND-mol-object-id — 

} 



********************* **********m**m^it**.**w******^*m***^m** 

— Sub unit numbers in a Molecular Complex 

— This number can be an integer or a fuzzy integer. 

" Field description for BIND-CompIex-set 

************************************** 

— num = integer number of sub-units 

num-fuzz = fiizzy integer number of sub-units (e.g. microtubule, virus) 

********************************************** m*^,if^m ************ 



BrND-moI-sub-num ::= CHOICE { 
num INTEGER, 
num-flizz Int-fuzz 

} 



^ ***************************** ***************^*^t^****^^^^^^^^^*^^ 

— A graph describing topology of a molecular complex 

— BIND-mol-object objects are nodes in the graph 

-T BIND-mol-object-patr objects are edges in the e-^nh 

— Field description for BIND-mol-object 
************************************* 

— bmoid = internal ID BIND-object 

— sub-unit = a sub-unit in a molecular complex 

— num = number of this sub-unit 

" Field description for BIND-moI-object-pair 
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-- a-bmoid - the sub-unit pointed to by this ID is connected to.,. 
- b-bmoid the sub-unit pointed to by this ID 



BIND-moI-objcct SEQUENCE { 
bmoid BIND-moI-object-id, 
sub-unit BIND-obJect» 
nunn BIND-mol-sub-num OPTIONAL 

} 

BIND-moi-object-pair SEQUENCE { 
a-bmoid BIND-mol-object-id, 
b-bmoid BlND*mol-object-id 
} 

BrND-moI-object-id INTEGER 



— <!><!><t><f><!><!><!><f><!><!><t:; 

— * Biomoiecular chemical pathway * ^ 



- A set of Pathways 

- Field description for BIND-Path way-set 

- date - date this set of records was collected 

- database == name and description of database that this set comes from 

- pathways = set of pathway records 



BIND-Pathway.set SEQUENCE { 
date Date OPTIONAL, 
database BIND-Database-site OPTIONAL, 
pathways SEQUENCE OF BIND-Pathway 
} 

" A pathway record. 

— A collection of more than two interactions that form a pathway, 

— i.e. Three or more BIND-objects that arc generally free from each 

— other, but form a network of interactions. 

" Field description for BIND-pathway 

— date = date of record entry 

— updates = a list of updates for the record 
-* pid - pathway accession number 

~ pathway ^ a collection of interactions and signal modification objects 

— descr = description of a pathway 
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— source = empirical evidence references 
authors = person{s) who authored this record, 
priv = TRUE if this pathway is private 

»*•♦*•*»•••*•*♦*♦♦**••***•*•**•****♦*****•***»•*♦*♦************************ 



BIND-Palhway SEQUENCE { 
date Date, 

updates SEQUENCE OF BIND-updatc-object OPTIONAL, 
pid Pathway-id, 

pathway SEQUENCE OF interaction-id, 
descr SrND-path-descr, 
source BIND-pub-set, 

authors SEQUENCE OF Author OPTIONAL, 
priv BOOLEAN DEFAULT FALSE 

} 

Pathway-id ::= INTEGER 

************************************************************************* *** 

— Pathway description — 

" Field description for BIND-path-descr 

— descr = text description of pathway 

(e.g. lipid biosynthesis, bacteria! chemotaxis, Ras pathway, etc.) 
« cell-cycle = stage of a cell cycle that this pathway is in effect 

— pathological-state = disease manifestation if this pathway is present 

— pathway-actions = list of chemical actions that occur in the pathway 



BIND-path-descr ::= SEQUENCE { 

descr VisibleString OPTIONAL, 

cell-cycle SEQUENCE OF BIND-cellstage OPTIONAL, 
pathoiogical-state SEQUENCE OF BlND-pathol-state OPTIONAL, 
pathway-actions SEQUENCE OF BIND-state-cause OPTIONAL 
} 

BIND-pathol-state SEQUENCE { ^ 
pathway-iid Interaction-id, 
interaction CHOICE { 

ablated NULL, 

replaced-by Interaction-id 

h 

pathol-state VisibleString, 
descr VisibleString OPTIONAL, 
source BIND-pub-set 
} 

„ <t><i><|><i><i><!><t><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!><!> 
„«*♦*♦♦•»•*****♦*••***•*••*•*♦»♦ 

~ * BIND Cross Reference System * 
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— * IID cross reference record * 



_ 0^**00******0 ************************************ ***^******************m*** 

~ Cross reference for HD searching 

- Field description for BTND-Iid-Cross-Ref 

0m** m****m**0***0*m*9****** ************* 

— iid interaction ID 

- lids list of interactions that contain this iid 
pids = list of pathways that contain this iid 

— mcids = list of molecular complexes that contain this iid 

00«000*00********00************** ******* 0*********0**0 000000000000 0000m**** 



BIND-Iid-Cross-Ref::= SEQUENCE { 
iid Interaction-id, 

iids SEQUENCE OF interaction-id OPTIONAL, 

pids SEQUENCE OF Pathway-id OPTIONAL, 

mcids SEQUENCE OF Molecular<;ompiex-id OPTIONAL 

} 



_^ 00000*00000000000 00*0*0******** 

" * MCID cross reference record * 

^**0********0*0**00000* ********* 

„ *************************** **********************00****00000000000000000 00m 

— Cross reference for MCID searching 

" Field description for BIND-Mcid-Cross-Ref 

_ 0*00000000*00000 00*0***************00*000 

— moid - Molecular Complex ID 

— iids = list of interactions that contain this mcid 

— pids = list of pathways that contain this mcid 

— mcids = list of molecular complexes that contain this mcid 

********************************************0*00000000000000000000000m00000 



BfND-Mcid-Cross-Ref ::= SEQUENCE { 
mcid MoiecuIar-Complex-id, 
iids SEQUENCE OF Interaction-id OPTIONAL, 
pids SEQUENCE OF Pathway-id OPTIONAL, 
mcids SEQUENCE OF Moiecular-Complex-id OPTIONAL 
} 



0*00000000000000000*000000000000 

— * GI/DI cross reference record * 

000000 0000 00**0***************** 

****** ***************************************000*00*000000000000000000000 00 

— Cross reference for gi/di searching 

— Field description for BIND-Cross-Ref 
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************************************ 

" gi = gi number 
-- di = di number 

— iids = list of interactions that this gi is involved in 

" pids = list of pathways that this gi is involved in 

mcids = list of molecular complexes that this gi is involved in 
^^^^0*********** ******************* ********************** 



BIND-Cross-Ref SEQUENCE { 
gi Geninfo-id DEFAULT 0, 
di Domain-id DEFAULT 0, 
iids SEQUENCE OF Interaction-id, 
pids SEQUENCE OF Pathway-id OPTIONAL, 
mcids SEQUENCE OF Molecuiar-Complex-id OPTIONAL 
} 



_^ ********************** *********** 

— * PMID/MUID cross reference record * 
__************************************ 

*^*****^m******* ******************************** ********* 

Cross reference for pmid/muid searching 
" Field description for BIND-Pub-Cross-Ref 

^ ****** **m*^*^^** ************** *********** 

— uid = muid or pmid 

-- iids - list of interactions with this publication as a reference 

— pids = list of pathways with this publication as a reference 

— mcids = list of molecular complexes with this publication as a reference 

*********m*********************************************** 



BIND-Pub-Cross-Ref ::= SEQUENCE { 
uid INTEGER, 

iids SEQUENCE OF Interaction-id, 

pids SEQUENCE OF Pathway-id OPTIONAL, 

mcids SEQUENCE OF Molecular-CompIex-id OPTIONAL 

} 

END 
39 
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